Does it make sense to run open shift 3 / okd on bare metal or on virtual machines
What would be the pros and cons of each?
would it not affect overall performance if it runs on virtual machines?
Basically, you had better to use virtual machine for efficient resource usage, and if you use cloud platform supported by OpenShift for using virtual machine, you can also use auto scaling through API.
In other hand, if you need to use GPU and CPU aggressively for some tasks, then you had better some host as bare metal.
You can mix/adjust above host machines as your system requirements either.
I hope it help you.
Related
I have been tasked with recommending the VM provisioning for an OpenShift production environment. The OpenShift installation documents don't really detail a lot of different options. I know that we want High Availability (which means multiple masters) but some of the things that I'm a bit confused by are:
separate hosts for etcd
infrastructure nodes
Do I need separate hosts/nodes for etcd? (advantages seem to be performance related but would like to better understand)
Do I need separate hosts/nodes for the infrastructure components (registry, router, etc.) or can these just be hosted on the master nodes?
AFAIK etcd can be on same host as master unless you really have a big cluster and want maintenance of etcd separate of openshift cluster.
Running routers on dedicated nodes help having high availability and reduce chances of nodes running into health issues due to other container work loads running on same machine. applications inside openshift cluster can run even if all masters go down (may be rare) but router nodes need to be available all the time for serving traffic.
There are many reference architectures published by redhat checkout blog.openshift.com and also redhat.com official docs
etcd and masters can be installed in the same node or separately. Here you can find some best practices for etcd. As you see, here is recommended that it is installed separately and this is what I would suggest if you can "afford" more servers. If not, co-locating masters and etcds we can say is symbiotic in that masters are CPU intensive whereas etcd uses a lot of disk IO and memory.
Regarding infrastructure deployments such as routers, docker-registry, EFK stack, metrics and so forth, the recommended deployment configuration (all within your possibilities) is that masters are not schedulable, and they worry only about serving the API and controlling the nodes. Then you can split your schedulable nodes into infrastructure and compute nodes.
Infrastructure nodes will only host applications used by the cluster itself or by other applications (i.e. Gitlab or Nexus)
Worker/Compute nodes will host business applications
Having a multi-master installation with HA routers is of course the best solution, but then you have to decide how you want to provide this HA, is it with an external LoadBalancer or with IP Failover?
As #debianmaster mentioned, there are several reference architecture documents you can read. Like this one here
I was exploring the architecture of Google's IaaS/PaaS oferings, and I am confused as to how GKE (Google Container Engine) runs in Google data centers. From this article (http://www.wired.com/2012/07/google-compute-engine/) and also from some of the Google IO 2012 sessions, I gathered that GCE (Google Compute Engine) runs the provisioned VMs using KVM (Kernel-based Virtual Machine); these VMs run inside Google's cgroups-based containers (this allows Google to schedule user VMs the same way they schedule their existing container-based workloads; probably using Borg/Omega). Now how does Kubernetes figure into this, given that it makes you run Docker containers on GCE provisioned VMs, and not on bare metal? If my understanding is correct, then Kubernetes-scheduled Docker containers run inside KVM VMs which themselves run inside Google cgroups containers scheduled by Borg/Omega...
Also, how does Kubernetes networking fit into Google's existing GCE Andromeda software-defined networking?
I understand that this is a very low-level architectural question, but I feel understanding of the internals will ameliorate my understanding of how user workloads eventually run on bare metal. Also, I'm curious, if the whole running containers on VMs inside containers is necessary from a performance point of view? E.g. doesn't networking performance degrade by having multiple layers? Google mentions in its Borg paper (http://research.google.com/pubs/archive/43438.pdf) that they run their container-based workloads without a VM (they don't want to pay the "cost of virtualization"); I understand the logic of running public external workloads in VMs (better isolation, more familiar model, heteregeneous workloads, etc.), but with Kubernetes, can not our workloads be scheduled directly on bare metal, just like Google's own workloads?
It is possible to run Kubernetes on both virtual and physical machines see this link. Google's Cloud Platform only offers virtual machines as a service, and that is why Google Container Engine is built on top of virtual machines.
In Borg, containers allow arbitrary sizes, and they don't pay any resource penalties for odd-sized tasks.
From what I gather, the only way to use a MySQL database with Azure websites is to use Cleardb but can I install MySQL on VMs provided in Azure Cloud Services. And if so how?
This question might get closed and moved to ServerFault (where it really belongs). That said: ClearDB provides MySQL-as-a-Service in Azure. It has nothing to do with what you can install in your own Virtual Machines. You can absolutely do a VM-based MySQL install (or any other database engine that you can install on Linux or Windows). In fact, the Azure portal even has a tutorial for a MySQL installation on OpenSUSE.
If you're referring to installing in web/worker roles: This simply isn't a good fit for database engines, due to:
the need to completely script/automate the install with zero interaction (which might take a long time). This includes all necessary software being downloaded/installed to the vm images every time a new instance is spun up.
the likely inability for a database cluster to cope with arbitrary scale-out (the typical use case for web/worker roles). Database clusters may or may not work well when a scale-out occurs (adding an additional vm). Same thing when scaling in (removing a vm).
less-optimal attached-storage configuration
inability to use Linux VMs
So, assuming you're still ok with Virtual Machines (vs stateless Cloud Service vm's): You'll need to carefully plan your deployment, with decisions such as:
Distro (Ubuntu, CentOS, etc). Azure-supported Linux distro list here
Selecting proper VM size (the DS series provide SSD attached disk support; the G series scale to 448GB RAM)
Azure Storage attached disks being non-Premium or Premium (premium disks are SSD-backed, durable disks scaling to 1TB/5000 IOPS per disk, up to 32 disks per VM depending on VM size)
Virtual network configuration (for multi-node cluster)
Accessibility of database cluster (whether your app is in the vnet or accesses it through a public endpoint; and if the latter, setting up ACL's)
Backup / HA / DR planning
Someone else mentioned using a pre-built VM image from VM Depot. Just realize that, if you go that route, you're relying on someone else to configure the database engine install for you. This may or may not be optimal for what you're trying to achieve. And the images may or may not be up-to-date with the latest versions, patches, etc.
Of course, what I wrote applies to any database engine you install in your own virtual machines, where a service provider (such as ClearDB) tends to take care of most of these things for you.
If you are talking about standard VMs then you can use a pre-built images on VMDepot for that.
If you are talking about web or worker roles (PaaS) I wouldn't recommend it, but if you really want to you could. You would need to fully script the install of the solution on the host. The only downside (and it's a big one) you would have would be the that the host will be moved to a new host at some point which would mean your MySQL data files would be lost - if you backed up frequently and were happy to lose some data then this option may work for you.
I think, that the main question is "what You want to achieve?". As I see, You want to use PaaS solution with Web Apps or Cloud Service and You need a MySQL database. If Yes, You have two options (both technically as David Makogon said). First one is to deploy Your own (one) server with MySQL and connect to it from the outside (internet side). Second solution is to create one MySQL server or cluster and connect Your application internally in Azure virtual network. WIth Cloud Service it is simple but with Web App it is not. You must create VPN gateway in Azure VM and connect Your Web App to this gateway. In this way You will have internal connection wfrom Your application to Your own MySQL cluster.
Simple question can a Java service layer running on Tomcat7 on a host machine connect to persistent data store (mySQL) running inside a virtual box with portforwarding? I want to know if the hibernate or Jdbc connection strings from host machine work if mySQL server is installed inside a VirtualBox.
Also if it does work can I expect behavioral deviations in terms of speed and connection pooling if everything is packaged into one single system and deployed in a real world web server in a single enviroment?
The short answer is yes, it is possible and will work. You will likely have to play with the firewall settings on your virtual box instance. You don't specify OS, so it's hard to tell you what exactly you'll need to tweak.
As far as deploying this in a real-world environment, if you mean production, you probably should NOT do that. This is a great setup to build on, but not something I would run in production.
To be clear, there won't be any issues behaviorally speaking, it will act as MySQL always acts, but it will absolutely be slower than running it on 'bare metal' -- how much slower will vary based on hardware, workload, etc. and it is generally not a great design for a production deployment..
Is there an easy way to setup an environment on one machine (or a VM) with MySQL replication? I would like to put together a proof of concept of MySQL replication with one Master write instance and two slave instances for reads.
I can see doing it across 2 or 3 VMs running on my computer, but that would really bog down my system. I'd rather have everything running on the same VM. What's the best way to proof out scalability solutions like this in a local dev environment?
Thanks for your help,
Dave
I think to truly test MySQL Replication it is important to do so in realistic constraints.
If you put all the replicate nodes under one operating system then you no longer have the bandwidth constraint, the data transfer speed would be much higher that what you would get if those replicate DBs are on different sites.
Everything under one VM is a shortcut to configurations, for instance it does not make you go through the configuration of the networking.
I suggest you use multiple VMs, even if you have to put them under one physical machine, you can always configure the hypervisor to make the packets go through a router, in which case the I/O will be bound by whatever the network interface has as throughput.
I can see doing it across 2 or 3 VMs
running on my computer, but that would
really bog down my system.
You can try and make a few VMs with JeOS (Just Enough OS) versions of the operating system you want. I know Ubuntu has one and it can boot on 128 RAM, which makes it convenient to deploy lots of cloned VMs under one physical machine without monster RAM.
Next step would be doing the same thing on a cloud (Infrastructure as a Service, IaaS) provider, and try your setup on different geographical sites.
If what you're testing is machine-to-machine replication, then setting up multiple VMs on a virtual private network would be the correct environment to test it. If you use Ubuntu Server, you don't have to install more than you actually need -- just give the VMs enough space for a base install + MySQL + your data. Memory usage can be as little as 256MB per VM. All you have to do is suspend or shutdown the VMs when you're not running a full-up test.
I've had situations where I was running 4 or more VMs simultaneously on my workstation, either for development or testing purposes -- it's not that taxing unless you're trying to do video rendering in each VM.