How to setup Hashicorp Vault with consul on Openshift - openshift

I implemented the Hashicorp Vault with the raft, but my organization wants now to change the raft to consul like remove the present vault cluster and re-install with consul but I found in the official Hashicorp documentation as the given below:
Reference: Hashicorp Document related to image
In the same way there from the GitHub and other blogs the installation steps were provided for Hashicorp Vault with the consul. So, kindly help me to understand and also which is preferred raft or consul and why it is preferable?

I think the real question you are asking here is Vault's Integrated Storage (aka Raft) vs. Consul (external storage).
There are several aspects to this but the top 3 are:
REDUCED COST & REDUCED OPERATIONAL OVERHEAD & LACK OF CONSUL EXPERTISE ON YOUR
If you want to reduce operational costs (by reducing administrative overhead), then choose Integrated Storage
(Raft) is the preferred choice. In a standard cluster configuration,
you only need 5 Vault nodes, which translates to VMs on AWS, Azure,
GCP, etc. With Consul you will need 3 Vault VMs + 5+ Consul VMs, so
a minimum of 8 - see Reference Architecture with Consul
INSPECTING VAULT DATA
If you need to inspect Vault's data frequently, then Consul as an external storage is the better option
SIDECAR + SERVICE DISCOVERY
If you rely on Consul's service discover and sidecar proxy pattern, then you need it. By contrast, if you are only leveraging Vault's secret management features and capabilities, then the Integrated Storage (Raft) would do just fine.
Take a look at the checklist at the end of this article and at the reference architectures for more clarity.
For leveraging different type of backends with Vault, you can take a look at this Pluralsight course: Managing Access and Secrets in HashiCorp Vault
There courses covering Consul as well, of course. But generally, Consul is a a lot more than just a Vault backend for storing data.

Related

Should I use k8s statefulsets directly or mysql-operator to deploy master-slave mysql cluster?

So I want to deploy a master-slaves MySQL cluster in k8s. I found 2 ways that seem popular:
The first one is to use statefulsets directly from k8s official document: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/
The second one is to use operator, i.e. https://github.com/oracle/mysql-operator
Which way is most commonly used?
Also, in statefulsets, if my MySQL master dies, will k8s automatically promote the slave to be the master?
Lastly, when my logic backend app performs an operation (CRUD) to MySQL cluster, how does k8s know which pod to route to, i.e. write operation can only be sent to master while read is sent to all?
Users can deploy and maintain a set of highly available MySQL services in k8s based on StatefulSets, the process is relatively complex. This process requires users to familiarize themselves with various k8s resource objects, learn many MySQL operation details and maintain a set of complex management scripts. Kubernetes Operators are designed to reduce the threshold for deploying complex applications on k8s.
Operator hides the orchestration details of complex applications and greatly reduces the threshold to use them in k8s. If you need to deploy other complex applications, we recommend that you use the Operator.
Speaking about master election while using StatefulSet.
Electing potential slave to be a master is not an automatic process - you have to configure this manually using Xtrabackup - here is more information - setting_up_replication.
Take a look: cloning-existing-data, starting-replication, mysql-statefulset-operator.
Useful tools: vitess for better MySQL networking management and percona-xtradb-cluster that provides superior performance, scalability and instrumentation.

AWS authentication to Vault

We're using Vault to store our application secrets and config. When our app (Java) starts, a script does all the magic of getting the secrets and config from Vault and storing them locally for the application to read. The script is authenticating to Vault using AWS IAM role.
Now we're getting to a situation where the application needs to read secrets from Vault on the go, not just on startup. For that purpose, I need it to be able to do the authentication pretty much on every request. It's worth mentioning that the app might also run on the developer machine, so whatever authentication done - it needs to work on the EC2 instance as well as the local development environment.
I'm currently leaning towards creating a username and password, store them in Vault for the application to get when starting up. Then the application could use that username/password to authenticate to Vault when it needs.
I'm also considering AppRole, but can't really see any real advantage to it over simple user/password setup.
What's the best solution for this use-case? Any advise would be highly appreciated!
Thanks,
Yosi
The AWS recommendation for storing secrets is to use AWS Systems Manager Parameter Store.
Software running on an Amazon EC2 instance with an assigned Role can use those credentials to access the Parameter Store to retrieve application secrets.
The Parameter Store can also be used outside of EC2, but some AWS credentials will still be needed to authenticate to the Parameter Store.

OpenShift 3.5 Architecture - VM Provisioning

I have been tasked with recommending the VM provisioning for an OpenShift production environment. The OpenShift installation documents don't really detail a lot of different options. I know that we want High Availability (which means multiple masters) but some of the things that I'm a bit confused by are:
separate hosts for etcd
infrastructure nodes
Do I need separate hosts/nodes for etcd? (advantages seem to be performance related but would like to better understand)
Do I need separate hosts/nodes for the infrastructure components (registry, router, etc.) or can these just be hosted on the master nodes?
AFAIK etcd can be on same host as master unless you really have a big cluster and want maintenance of etcd separate of openshift cluster.
Running routers on dedicated nodes help having high availability and reduce chances of nodes running into health issues due to other container work loads running on same machine. applications inside openshift cluster can run even if all masters go down (may be rare) but router nodes need to be available all the time for serving traffic.
There are many reference architectures published by redhat checkout blog.openshift.com and also redhat.com official docs
etcd and masters can be installed in the same node or separately. Here you can find some best practices for etcd. As you see, here is recommended that it is installed separately and this is what I would suggest if you can "afford" more servers. If not, co-locating masters and etcds we can say is symbiotic in that masters are CPU intensive whereas etcd uses a lot of disk IO and memory.
Regarding infrastructure deployments such as routers, docker-registry, EFK stack, metrics and so forth, the recommended deployment configuration (all within your possibilities) is that masters are not schedulable, and they worry only about serving the API and controlling the nodes. Then you can split your schedulable nodes into infrastructure and compute nodes.
Infrastructure nodes will only host applications used by the cluster itself or by other applications (i.e. Gitlab or Nexus)
Worker/Compute nodes will host business applications
Having a multi-master installation with HA routers is of course the best solution, but then you have to decide how you want to provide this HA, is it with an external LoadBalancer or with IP Failover?
As #debianmaster mentioned, there are several reference architecture documents you can read. Like this one here

Differences between OpenShift and Kubernetes

What's the difference between OpenShift and Kubernetes and when should you use each? I understand that OpenShift is running Kubernetes under the hood but am looking to determine when running OpenShift would be better than Kubernetes and when OpenShift may be overkill.
In addition to the additional API entities, as mentioned by #SteveS, Openshift also has advanced security concepts.
This can be very helpful when running in an Enterprise context with specific requirements regarding security.
As much as this can be a strength for real-world applications in production, it can be a source of much frustration in the beginning.
One notable example is the fact that, by default, containers run as root in Kubernetes, but run under an arbitrary user with a high ID (e.g. 1000090000) in Openshift. This means that many containers from DockerHub do not work as expected. For some popular applications, The Red Hat Container Catalog supplies images with this feature/limitation in mind. However, this catalog contains only a subset of popular containers.
To get an idea of the system, I strongly suggest starting out with Kubernetes. Minikube is an excellent way to quickly setup a local, one-node Kubernetes cluster to play with. When you are familiar with the basic concepts, you will better understand the implications of the Openshift features and design decisions.
OpenShift includes a distribution of Kubernetes, so if you don't need any of those added features of OpenShift you can choice to ignore them such as: Web Console, Builds, advanced deployment models and much, much more.
Here's a summary of items available on the OpenShift website.
Kubernetes comes with Ingress Rules but Openshift comes with Routes
Kubernetes has IngressController but Openshift has Router as HAProxy
To swtich namespace in cli for openshift is very easy but in
kubernetes you need to create contex and switch between context
Openshift UI has more interactive and informative then Kubernetes
To bake docker image inside Openshift has BuildConfig but kubernetes
don't has any thing you need to build image and push to registry
Openshift has Pipeline where u don't need any jenkins to deploy any
app but Kubernetes don't has.
The easiest way to differentiate between them is to understand that while vanilla K8S is community project, OpenShift is more focused towards making it a enterprise ready product. Resources like Imagestreams, BC, Builds, DC, Routes etc along with leveraging functionalities like S2I, Router etc make it easier for Developers and admin alike to use OCP for development, deployment and lifecycle management. You can refer to the URL https://cloud.redhat.com/learn/topics/kubernetes/ for getting more information on key differences between them.
OCP makes your life much easier by giving easy actions using CLI command OC and fine grained webconsole.
You can try OCP and get first hand experience of the features using https://developers.redhat.com/developer-sandbox
where you can quick get access to sandboxed environment in a shared cluster.

Is it recommended to run clustered database with Kubernetes in production environment?

Is it reasonable to use Kubernetes for a clustered database such as MySQL in production environment?
There are example configurations such as mysql galera example. However, most examples do not make use of persistent volumes. As far as I've understood persistent volumes must reside on some shared file system as defined here Kubernetes types of persistent volumes. A shared file system will not guarantee that the database files of the pod will be local to the machine hosting the pod. It will be accessed over network which is rather slow. Moreover, there are issues with MySQL and NFS, for example.
This might be acceptable for a test environment. However, what should I do in a production environment? Is it better to run the database cluster outside Kubernetes and run only application servers with Kubernetes?
The Kubernetes project introduced PetSets, a new pod management abstraction, intended to run stateful applications. It is an alpha feature at present (as of version 1.4) and moving rapidly. A list of the various issues as we move to beta are listed here. Quoting from the section on when to use petsets:
A PetSet ensures that a specified number of "pets" with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In addition to the above, it can be coupled with several other features which help one deploy clustered stateful applications and manage them. Coupled with dynamic volume provisioning for example, it can be used to provision storage automatically.
There are several YAML configuration files available (such as the ones you referenced) using ReplicaSets and Deployments for MySQL and other databases which may be run in production and are probably being run that way as well. However, PetSets are expected to make it a lot easier to run these types of workloads, while supporting upgrades, maintenance, scaling and so on.
You can find some examples of distributed databases with petsets here.
The advantage of provisioning persistent volumes which are networked and non-local (such as GlusterFS) is realized at scale. However, for relatively small clusters, there is a proposal to allow for local storage persistent volumes in the future.