Is it recommended to run clustered database with Kubernetes in production environment? - mysql

Is it reasonable to use Kubernetes for a clustered database such as MySQL in production environment?
There are example configurations such as mysql galera example. However, most examples do not make use of persistent volumes. As far as I've understood persistent volumes must reside on some shared file system as defined here Kubernetes types of persistent volumes. A shared file system will not guarantee that the database files of the pod will be local to the machine hosting the pod. It will be accessed over network which is rather slow. Moreover, there are issues with MySQL and NFS, for example.
This might be acceptable for a test environment. However, what should I do in a production environment? Is it better to run the database cluster outside Kubernetes and run only application servers with Kubernetes?

The Kubernetes project introduced PetSets, a new pod management abstraction, intended to run stateful applications. It is an alpha feature at present (as of version 1.4) and moving rapidly. A list of the various issues as we move to beta are listed here. Quoting from the section on when to use petsets:
A PetSet ensures that a specified number of "pets" with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In addition to the above, it can be coupled with several other features which help one deploy clustered stateful applications and manage them. Coupled with dynamic volume provisioning for example, it can be used to provision storage automatically.
There are several YAML configuration files available (such as the ones you referenced) using ReplicaSets and Deployments for MySQL and other databases which may be run in production and are probably being run that way as well. However, PetSets are expected to make it a lot easier to run these types of workloads, while supporting upgrades, maintenance, scaling and so on.
You can find some examples of distributed databases with petsets here.
The advantage of provisioning persistent volumes which are networked and non-local (such as GlusterFS) is realized at scale. However, for relatively small clusters, there is a proposal to allow for local storage persistent volumes in the future.

Related

Should I use k8s statefulsets directly or mysql-operator to deploy master-slave mysql cluster?

So I want to deploy a master-slaves MySQL cluster in k8s. I found 2 ways that seem popular:
The first one is to use statefulsets directly from k8s official document: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
The second one is to use operator, i.e. https://github.com/oracle/mysql-operator
Which way is most commonly used?
Also, in statefulsets, if my MySQL master dies, will k8s automatically promote the slave to be the master?
Lastly, when my logic backend app performs an operation (CRUD) to MySQL cluster, how does k8s know which pod to route to, i.e. write operation can only be sent to master while read is sent to all?
Users can deploy and maintain a set of highly available MySQL services in k8s based on StatefulSets, the process is relatively complex. This process requires users to familiarize themselves with various k8s resource objects, learn many MySQL operation details and maintain a set of complex management scripts. Kubernetes Operators are designed to reduce the threshold for deploying complex applications on k8s.
Operator hides the orchestration details of complex applications and greatly reduces the threshold to use them in k8s. If you need to deploy other complex applications, we recommend that you use the Operator.
Speaking about master election while using StatefulSet.
Electing potential slave to be a master is not an automatic process - you have to configure this manually using Xtrabackup - here is more information - setting_up_replication.
Take a look: cloning-existing-data, starting-replication, mysql-statefulset-operator.
Useful tools: vitess for better MySQL networking management and percona-xtradb-cluster that provides superior performance, scalability and instrumentation.

OpenShift 3.5 Architecture - VM Provisioning

I have been tasked with recommending the VM provisioning for an OpenShift production environment. The OpenShift installation documents don't really detail a lot of different options. I know that we want High Availability (which means multiple masters) but some of the things that I'm a bit confused by are:
separate hosts for etcd
infrastructure nodes
Do I need separate hosts/nodes for etcd? (advantages seem to be performance related but would like to better understand)
Do I need separate hosts/nodes for the infrastructure components (registry, router, etc.) or can these just be hosted on the master nodes?
AFAIK etcd can be on same host as master unless you really have a big cluster and want maintenance of etcd separate of openshift cluster.
Running routers on dedicated nodes help having high availability and reduce chances of nodes running into health issues due to other container work loads running on same machine. applications inside openshift cluster can run even if all masters go down (may be rare) but router nodes need to be available all the time for serving traffic.
There are many reference architectures published by redhat checkout blog.openshift.com and also redhat.com official docs
etcd and masters can be installed in the same node or separately. Here you can find some best practices for etcd. As you see, here is recommended that it is installed separately and this is what I would suggest if you can "afford" more servers. If not, co-locating masters and etcds we can say is symbiotic in that masters are CPU intensive whereas etcd uses a lot of disk IO and memory.
Regarding infrastructure deployments such as routers, docker-registry, EFK stack, metrics and so forth, the recommended deployment configuration (all within your possibilities) is that masters are not schedulable, and they worry only about serving the API and controlling the nodes. Then you can split your schedulable nodes into infrastructure and compute nodes.
Infrastructure nodes will only host applications used by the cluster itself or by other applications (i.e. Gitlab or Nexus)
Worker/Compute nodes will host business applications
Having a multi-master installation with HA routers is of course the best solution, but then you have to decide how you want to provide this HA, is it with an external LoadBalancer or with IP Failover?
As #debianmaster mentioned, there are several reference architecture documents you can read. Like this one here

Can I install MySQL on the VMs provided in Azure Cloud Services?

From what I gather, the only way to use a MySQL database with Azure websites is to use Cleardb but can I install MySQL on VMs provided in Azure Cloud Services. And if so how?
This question might get closed and moved to ServerFault (where it really belongs). That said: ClearDB provides MySQL-as-a-Service in Azure. It has nothing to do with what you can install in your own Virtual Machines. You can absolutely do a VM-based MySQL install (or any other database engine that you can install on Linux or Windows). In fact, the Azure portal even has a tutorial for a MySQL installation on OpenSUSE.
If you're referring to installing in web/worker roles: This simply isn't a good fit for database engines, due to:
the need to completely script/automate the install with zero interaction (which might take a long time). This includes all necessary software being downloaded/installed to the vm images every time a new instance is spun up.
the likely inability for a database cluster to cope with arbitrary scale-out (the typical use case for web/worker roles). Database clusters may or may not work well when a scale-out occurs (adding an additional vm). Same thing when scaling in (removing a vm).
less-optimal attached-storage configuration
inability to use Linux VMs
So, assuming you're still ok with Virtual Machines (vs stateless Cloud Service vm's): You'll need to carefully plan your deployment, with decisions such as:
Distro (Ubuntu, CentOS, etc). Azure-supported Linux distro list here
Selecting proper VM size (the DS series provide SSD attached disk support; the G series scale to 448GB RAM)
Azure Storage attached disks being non-Premium or Premium (premium disks are SSD-backed, durable disks scaling to 1TB/5000 IOPS per disk, up to 32 disks per VM depending on VM size)
Virtual network configuration (for multi-node cluster)
Accessibility of database cluster (whether your app is in the vnet or accesses it through a public endpoint; and if the latter, setting up ACL's)
Backup / HA / DR planning
Someone else mentioned using a pre-built VM image from VM Depot. Just realize that, if you go that route, you're relying on someone else to configure the database engine install for you. This may or may not be optimal for what you're trying to achieve. And the images may or may not be up-to-date with the latest versions, patches, etc.
Of course, what I wrote applies to any database engine you install in your own virtual machines, where a service provider (such as ClearDB) tends to take care of most of these things for you.
If you are talking about standard VMs then you can use a pre-built images on VMDepot for that.
If you are talking about web or worker roles (PaaS) I wouldn't recommend it, but if you really want to you could. You would need to fully script the install of the solution on the host. The only downside (and it's a big one) you would have would be the that the host will be moved to a new host at some point which would mean your MySQL data files would be lost - if you backed up frequently and were happy to lose some data then this option may work for you.
I think, that the main question is "what You want to achieve?". As I see, You want to use PaaS solution with Web Apps or Cloud Service and You need a MySQL database. If Yes, You have two options (both technically as David Makogon said). First one is to deploy Your own (one) server with MySQL and connect to it from the outside (internet side). Second solution is to create one MySQL server or cluster and connect Your application internally in Azure virtual network. WIth Cloud Service it is simple but with Web App it is not. You must create VPN gateway in Azure VM and connect Your Web App to this gateway. In this way You will have internal connection wfrom Your application to Your own MySQL cluster.

Best solution for automated collection of data from remote MySQL servers

I have done extensive research, i feel that i have good candidates but i still lack enough knowledge to decide which one i should implement, ideally i would like to hear from someone that actually implemented a solution to a similar problem.
The Problem
Our project consists of a community of distributed nodes (25 nodes). The nods run on Linux computers, and are installed in the typical residential setting (behind a NAT), with wide dispersion geographically and ISP wise.
Our software on the node collects a variety of its own unique data which is logged to MySQL DB on the local host (node) which is not WAN accessible directly. We also have a Web interface for each node that uses the local node DB to allow the local node user to visualize certain data and parameters; this is only accessible on the LAN.
We typically set-up and maintain an open port for SSH from our labs to each node. All remote DB on nodes have the exact same schema but completely different data. We need an automated way to collect all data from all the nodes and get them to our WAN accessible lab servers (Windows 7 servers, but can be Linux if it provide a better solution). We have narrowed the option as follow:
Solutions:
Create a .bat script that sequentially connect to each node over SSH to import data.
Use the web interface that runs on each node to periodically query the local db then save that data to a central MySQL server. I know i can connect to two db in PHP. seems to be doable here.
Use the MySQL supported “slave-master” replication setup which will duplicate all remote databases on the server.
Use the MySQL supported federated engine setup which will link local tables to remote ones.
Questions:
Are all these viable solutions?
Any major cons i should be aware of for the viable ones?
is there better solutions available (paid or otherwise) ?

Java EE application deployment on Amazon EC2

We have a Java EE application (EAR file deployed on JBoss, MySQL, MongoDB) which we would like to deploy on an Amazon EC2 instance. I have several questions regarding deployment best practices.
What is the most commonly used Linux AMI which we can rely on for a robust deployment (There are so many Linux variants, and I am not sure which AMI is commonly used, is it Fedora, CentOS, Red Hat, SUSE ...)
How do we handle production upgrades (EAR file modifications or schema upgrades). Are there any tools which are available to handle this installation or rollback of these changes.
What kind of data backup capability is available for the database?
Should I rely on Amazon RDS for MySQL support?
How should I handle support for MongoDB?
This is the first time, I am hosting an web-app and would appreciate some inputs on how to manage the production instance.
I agree with Mark Robinson's answer: Use whichever Unix variant you're most comfortable with. It may pay to pick one with decent cloud support. For my site I use Ubuntu.
I have a common image which is the base of every version deploy I do. I have www.mysite.com pointing to an Elastic IP so I can decide which instance it goes to. The common image has all the software I need installed (Postgres/Postgis/Tomcat/etc) but the database and web server data folders and symlinked to Elastic Block Store (EBS) instances.
When it comes time to do a deploy I start a new instance up, freeze and snapshot the EBS volumes on production and make new volumes. I point my new instance at the new volumes and then install whatever I need to onto that. Once I've smoke tested everything successfully I can switch the Elastic IP to point to the new instance and everything keeps on going.
I'll note that I currently have the advantage where only I can modify the database; no users can. This will become a problem shortly.
If you use the XFS filesystem on top of the EBS volume then you can tell XFS to freeze the file system (so no updates happen) then call the EC2 api to snapshot the volume then unfreeze the file system. The result is that the snapshot is taken quickly and sent to S3. I have a nightly script which does this.
If RDS looks like it will suit your needs then use it. Amazon is building lots of solid tools quickly and this will ease your scalability issues if you have any.
I'm sorry, I have no idea.
Good question!
1) I would recommend going with whatever Linux variant you are most comfortable with. If you have someone who is really keen on CentOS, go with that. Once you have selected your AMI, take it and customize it by configuring how you want it. Then save that AMI as you base-layout. It will make rolling out new machines much easier and save your bacon if EC2 goes down.
2) Upgrades with EC2 can be tres cool. Instead of upgrading a live system, take your pre-configured AMI, update that and save that AMI as myAMI-1.1 (or whatever). That way, you can flip over to the new system almost instantly AND roll back to a previous version in case something breaks. You can also back-up DB instances to S3. It's cheap at about $0.10/GB/Month.
3) It depends where you are storing your DB. If you are storing it on your EC2 instance you are in trouble. The EC2 instances have no persistence storage. So if your machine crashes, you lose everything. I'm not familiar with Amazon DB system but you should also look into Elastic Block Store. It's basically an actual hard-drive you can write to. When you want to upgrade your schema, do a full DB dump to S3 and then do an upgrade of your actual schema. If something goes wrong, you can pull the previous version out of S3.
4) & 5) I have never used those so I can't help you.
What is the most commonly used Linux AMI which we can rely on for a robust deployment (There are so many Linux variants, and I am not sure which AMI is commonly used, is it Fedora, CentOS, Red Hat, SUSE ...)
How do we handle production upgrades (EAR file modifications or schema upgrades). Are there any tools which are available to handle this installation or rollback of these changes.
What kind of data backup capability is available for the database?
Should I rely on Amazon RDS for MySQL support?
How should I handle support for MongoDB?
Any Linux AMI will do the job, what you need is a JRE only. (assuming development work not required). If you need to monitor the JVM behavior then get JConsole installed.
Easiest and painless way is to SSH into the local home directory, transfer the updated class file/EAR file (depends the number of changes applied) and copy and replace into the Tomcat deployment directory, restart apache. (make sure you tested locally before upload to production).
Depends on which database you are using, if you are using MySQL then just do scheduled backup that writes to your home directory so that from time to time you could SSH in and download a copy for backup purpose.
I would not consider reply on Amazon RDS for MySQL support due to 2 reasons: MySQL is small enough and manageable, and also I would want to have total complete control of the database and why pay for more when you can do it yourself FOC?
The usage of MongoDB should be align with the purpose of your application and benefits you gain from that. I would recommend you use MongoDB for static data retrieval like state, country, area etc... where MySQL to be use for transaction data only.
If you can live with deploying your Java EE application on TomEE instead of JBoss, Boxfuse does what you want.
For you Java EE application you literally only have to execute (TomEE uses war files instead of ear files):
boxfuse run my-tomee-app-1.0.war -env=prod
This will
Create AMI containing TomEE and your application ready to boot
Create an Elastic IP or ELB
Create a security group with the correct ports defined
Create an auto-scaling group
Launch your instance(s)
Any subsequent update will be done as a zero downtime blue/green deployment.
More info: https://boxfuse.com/blog/javaee-aws