Containers vs Serverless vs Virtual Machines [closed] - containers

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I made some research about Containers, Serverless, and Virtual machines all of these has its own benefits like cost, deployments, reliability, etc. but I am still confused when to use these, and what kind of situations.

Virtual Machines (Hypervisor)
This straightforwardly emulates an OS, the virtualization is given a predefined amount of resources by the host machine's OS.
Reliability: High
Cost: High
Pros:
More isolated (HyperVisor + OS), making it harder to compromise the host machine in the event the virtual machine is hijacked. You can have as many as you manually allocate resources to.
Cons:
They consume a set amount of resources from the Host machine while they are on, increasing cost
They are a bit trickier to deploy and orchestrate due to the aforementioned
Containers
These emulate an OS, however these are run under processes directly in the host machine's OS and are built to be lightweight with a singular purpose.
Reliability: High
Cost: Medium
Pros:
They can be started, stopped, frozen and overall controlled very easily with the use of orchestration, allowing for more optimal usage of resources in the host machine.
They are very maleable, which means you can create a container for a specific type of operation you want and call upon it for any given task.
Crazy fast deployment with the aforementioned, making it less painful to kill host machines when they aren't in use.
Cons:
They have less isolation, which means vulnerabilities may compromise the host machine more easily in the event of invasion.
Serverless
This is a niche concept that, contrary to its name, still involves a server. However its way of deployment is its strength, focusing on tiny requests that are simple albeit numerous.
Reliability: Yes?
Cost: Very Low to Absurdly High
Pros:
The idea behind it is executing a single tiny function that integrates with a system already in place. Clients send requests to a Gateway which then triggers the Serverless function, and they get a response.
Singularly cheap requests that would otherwise take up a dedicated server to receive en masse and executed can be handled by Serverless.
You pay by time used to execute these functions, which should execute super fast, so it scales very well.
Cons:
Works in tandem with other things so it is not a silver bullet by any means.
Poorly optimized functions or even poorly thought out requirements can quickly elevate cost.
Limited Technology availability (provider dictates which technologies you may call upon in Serverless, anything else is a "jury-rig")

Related

Benefit of running Kubernetes in bare metal and cloud with idle VM or machines?

I want to know the high level benefit of Kubernetes running in bare metal machines.
So let's say we have 100 bare metal machines ready with kubelet being deployed in each. Doesn't it mean that when the application only runs on 10 machines, we are wasting the rest 90 machines, just standing by and not using them for anything?
For cloud, does Kubernetes launch new VMs as needed, so that clients do not pay for idle machines?
How does Kubernetes handle the extra machines that are needed at the moment?
Yes, if you have 100 bare metal machines and use only 10, you are wasting money. You should only deploy the machines you need.
The Node Autoscaler works at certain Cloud Providers like AWS, GKE, or Open Stack based infrastructures.
Now, Node Autoscaler is useful if your load is not very predictable and/or scales up and down widely over the course of a short period of time (think Jobs or cyclic loads like a Netflix type use case).
If you're running services that just need to scale eventually as your customer base grows, that is not so useful as it is as easy to simply add new nodes manually.
Kubernetes will handle some amount of auto-scaling with an assigned number of nodes (i.e. you can run many Pods on one node, and you would usually pick your machines to run in a safe range but still allow handling of spikes in traffic by spinning more Pods on those nodes.
As a side note: with bare metal, you typically gain in performance, since you don't have the overhead of a VM / hypervisor, but you need to supply distributed storage, which a cloud provider would typically provide as a service.

Can CUDA permanently damage the GPU? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have not yet gotten into GPGPU programming, so I do not know the exact specifics of CUDA (OPENCL), but assuming that the GPU is in an optimal (well cooled,...) environment - can pure CUDA (OPENCL) code permanently damage the GPU? Is the GPGPU programming system robust enough to recover from all errors in the code?
I have seen this question, but that one was rather posted because eaponte had a specific problem, that needed to be resolved. I am asking more in a general fashion.
Thanks a lot for your input.
After using Nvidia Tesla cards in development and production environments shared across many users in the last years, I did not see any "physically damaged" cards in this time due to "bad programming". So to empirically answer the first question: I guess if (also a GTX) is able to run Crisis in max. level without catching fire it should also survive your OpenCL/CUDA kernel on high load. Yes, vendors usually do take care of heat levels and reduce clocking, etc. as you know it from your CPU. Nevertheless, system manufacturers need a certification to make sure they can handle the produced heat, especially in multi-GPU systems.
But of course there have been several codes in the wild that damaged all kinds hardware in the past and this can certainly happen to GPUs, too - but I never read about a specific, code-driven case, although it would be an interesting research question.
Generally, GPUs can be damaged like any piece of silicon by simply using them. That happens from time to time, e.g., due to transistor ageing or overheating by bad cooling. We also replaced suddenly failing GPUs after several years in service the same way as we replace CPUs.
Since the initial question is rather broad one more addition: today's CPUs/GPUs/APUs/... contain so many transistors and production processes are so complex that it is often the case that not all components of a chip are actually usable (see.: PS3/Cell Processor with 7/8 active lanes; enterprise vs. gaming products, et al.). We actually had the case last year where a driver update rendered some previously "working" GPU into GPUs that now produced many double ECC errors. That was fixed again with an other driver update and only affected cards from early production cycles of a specific generation. One idea that we speculated was that the breaking driver did not mask "unusable" parts of the RAM correctly, a behaviour that is otherwise transparent to the end-customer.
Is the GPGPU programming system robust enough to recover from all
errors in the code?
I can certainly answer this question with No. During development we have very often the case that frequent and "brutal" segfault-ing in a kernel can crash the driver. A full reboot of the host system is usually the only way how we recover in such a situation to make that specific GPU usable again.

ELI5: How etcd really works and what is consensus algorithm

I am having hard time to grab what etcd (in CoreOS) really does, because all those "distributed key-value storage" thingy seems intangible to me. Further reading into etcd, it delves into into Raft consensus algorithm, and then it becomes really confusing to understand.
Let's say that what happen if a cluster system doesn't have etcd?
Thanks for your time and effort!
As someone with no CoreOS experience building a distributed system using etcd, I think I can shed some light on this.
The idea with etcd is to give some very basic primitives that are applicable for building a wide variety of distributed systems. The reason for this is that distributed systems are fundamentally hard. Most programmers don't really grok the difficulties simply because there are orders of magnitude more opportunity to learn about single-system programs; this has really only started to shift in the last 5 years since cloud computing made distributed systems cheap to build and experiment with. Even so, there's a lot to learn.
One of the biggest problems in distributed systems is consensus. In other words, guaranteeing that all nodes in a system agree on a particular value. Now, if hardware and networks were 100% reliable then it would be easy, but of course that is impossible. Designing an algorithm to provide some meaningful guarantees around consensus is a very difficult problem, and one that a lot of smart people have put a lot of time into. Paxos was the previous state of the art algorithm, but was very difficult to understand. Raft is an attempt to provide similar guarantees but be much more approachable to the average programmer. However, even so, as you have discovered, it is non-trivial to understand it's operational details and applications.
In terms of what etcd is specifically used for in CoreOS I can't tell you. But what I can say with certainty is that any data which needs to be shared and agreed upon by all machines in a cluster should be stored in etcd. Conversely, anything that a node (or subset of nodes) can handle on its own should emphatically not be stored in etcd (because it incurs the overhead of communicating and storing it on all nodes).
With etcd it's possible to have a large number of identical machines automatically coordinate, elect a leader, and guarantee an identical history of data in its key-value store such that:
No etcd node will ever return data which is not agreed upon by the majority of nodes.
For cluster size x any number of machines > x/2 can continue operating and accepting writes even if the others die or lose connectivity.
For any machines losing connectivity (eg. due to a netsplit), they are guaranteed to continue to return correct historical data even though they will fail to write.
The key-value store itself is quite simple and nothing particularly interesting, but these properties allow one to construct distributed systems that resist individual component failure and can provide reasonable guarantees of correctness.
etcd is a reliable system for cluster-wide coordination and state management. It is built on top of Raft.
Raft gives etcd a total ordering of events across a system of distributed etcd nodes. This has many advantages and disadvantages:
Advantages include:
any node may be treated like a master
minimal downtime (a client can try another node if one isn't responding)
avoids split-braining
a reliable way to build distributed locks for cluster-wide coordination
users of etcd can build distributed systems without ad-hoc, buggy, homegrown solutions
For example: You would use etcd to coordinate an automated election of a new Postgres master so that there remains only one master in the cluster.
Disadvantages include:
for safety reasons, it requires a majority of the cluster to commit writes - usually to disk - before replying to a client
requires more network chatter than a single master system

Can a executable behave differently when run on a Virtualized Server?

Let's say I have a piece of code that runs fine on an OS. Now, if I install that OS on a virtual machine (server virtualization), and run that code on that, is it possible that the code behaves differently?
If so, what are the prerequisites for that? For example, does it have to be compiled machine code (in other words, are interpreted languages safe?)? Does it have to be certain OS instructions? Specific virtualization technology (Xen, KVM, VMware..)?
Also, what are the possible different behaviors?
Yes. Like any machine, the virtual machine is just another computer (implemented in software instead of hardware).
For one, lots of commercial apps will blow up when you run them on a VM due to:
copy protection detecting the VM
copy protection rigging your hardware, using undocumented features of BIOS/Kernel/hardware
Secondly, a VM is just another computer consisting of hardware implemented in assembly instead of circuits/dye/microcode/magic. This means the VM must provide the emulated hardware either through pass-through or emulation. The fact that hardware is very diverse can cause all kinds of different behavior. Also note the possible lack of drivers for or acceleration of the emulated hardware.
But of course a typical business application for example isn't nearly as likely to rely on any hardware details as all it does is call some GUI API.
Interpreted languages are only safe from this to the extent that they are "interpreted", if the interpreted language calls out to some native code, all this is possible again.
For an example of something detecting that it's running under a VM, check this, it's just one of the literally thousands of ways to detect the VM.
In theory the program should run exactly the same as on a physical machine.
In practice however, there may be differences due to
Machine\OS configuration and drivers
Load of the virtual machine host.
Differences in machine configuration are similar to difference you would see between any difference physical machine. Depending on how critical you application is to the end user, you should run the same set of tests that you would a physical box to determine whether the environment is acceptable for use.
Depending on the virtualisation technology, the host may not have the ability to guarantee the client resources at specific times. This can lead to weird behavior on the client. Potentially you would see more occurrences of application errors due to IO timeouts an starvation of memory.
To successfully virtualise an application for production use you need to do a bit of work to understand the resource profile of the application\client and virtual host.

registers vs stacks

What exactly are the advantages and disadvantages to using a register-based virtual machine versus using a stack-based virtual machine?
To me, it would seem as though a register based machine would be more straight-forward to program and more efficient. So why is it that the JVM, the CLR, and the Python VM are all stack-based?
Implemented in hardware, a register-based machine is going to be more efficient simply because there are fewer accesses to the slower RAM. In software, however, even a register based architecture will most likely have the "registers" in RAM. A stack based machine is going to be just as efficient in that case.
In addition a stack-based VM is going to make it a lot easier to write compilers. You don't have to deal with register allocation strategies. You have, essentially, an unlimited number of registers to work with.
Update: I wrote this answer assuming an interpreted VM. It may not hold true for a JIT compiled VM. I ran across this paper which seems to indicate that a JIT compiled VM may be more efficient using a register architecture.
This has already been answered, to a certain level, in the Parrot VM's FAQ and associated documents:
A Parrot Overview
The relevant text from that doc is this:
the Parrot VM will have a register architecture, rather than a stack architecture. It will also have extremely low-level operations, more similar to Java's than the medium-level ops of Perl and Python and the like.
The reasoning for this decision is primarily that by resembling the underlying hardware to some extent, it's possible to compile down Parrot bytecode to efficient native machine language.
Moreover, many programs in high-level languages consist of nested function and method calls, sometimes with lexical variables to hold intermediate results. Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.
You may also want to read this: Registers vs stacks for interpreter design
Quoting it a bit:
There is no real doubt, it's easier to generate code for a stack machine. Most freshman compiler students can do that. Generating code for a register machine is a bit tougher, unless you're treating it as a stack machine with an accumulator. (Which is doable, albeit somewhat less than ideal from a performance standpoint) Simplicity of targeting isn't that big a deal, at least not for me, in part because so few people are actually going to directly target it--I mean, come on, how many people do you know who actually try to write a compiler for something anyone would ever care about? The numbers are small. The other issue there is that many of the folks with compiler knowledge already are comfortable targeting register machines, as that's what all hardware CPUs in common use are.
Traditionally, virtual machine implementors have favored stack-based architectures over register-based due to 'simplicity of VM implementation' ease of writing a compiler back-end - most VMs are originally designed to host a single language and code density and executables for stack architecture are invariably smaller than executables for register architectures. The simplicity and code density are a cost of performance.
Studies have shown that a registered-based architecture requires an average of 47% less executed VM instructions than stack-based architecture, and the register code is 25% larger than corresponding stack code but this increase cost of fetching more VM instructions due to larger code size involves only 1.07% extra real machine loads per VM instruction which is negligible. The overall performance of the register-based VM is that it takes, on average, 32.3% less time to execute standard benchmarks.
One reason for building stack-based VMs is that that actual VM opcodes can be smaller and simpler (no need to encode/decode operands). This makes the generated code smaller, and also makes the VM code simpler.
How many registers do you need?
I'll probably need at least one more than that.
Stack based VM's are simpler and the code is much more compact. As a real world example, a friend built (about 30 years ago) a data logging system with a homebrew Forth VM on a Cosmac. The Forth VM was 30 bytes of code on a machine with 2k of ROM and 256 bytes of RAM.
It is not obvious to me that a "register-based" virtual machine would be "more straight-forward to program" or "more efficient". Perhaps you are thinking that the virtual registers would provide a short-cut during the JIT compilation phase? This would certainly not be the case, since the real processor may have more or fewer registers than the VM, and those registers may be used in different ways. (Example: values that are going to be decremented are best placed in the ECX register on x86 processors.) If the real machine has more registers than the VM, then you're wasting resources, fewer and you've gained nothing using "register-based" programming.
Stack based VMs are easier to generate code for.
Register based VMs are easier to create fast implementations for, and easier to generate highly optimized code for.
For your first attempt, I recommend starting with a stack based VM.