database docker container design - mysql

I am working on docker container on linux machine. I have to create database docker container. I have chosen MySQL database. I have three requirement:
load balancing - the database contain a huge table will approx. 100 million record. So we need to share the table across multiple server. To carter this I have chosen MySQL Cluster. I need to distribute the data based on the sharde key. The load balancing will be done by HAProxy.
Que : please correct me if I am wrong? provide a better solution
Persistence - even if the all database container dies, its should able to recover from it For this I have planned to create data-only docker container.
Que : if the data-only docker container dies, will this container able to recover? Is there any change in volume if it comes up?
Availability - Since there will be multiple SQL server with replica feature, even if one server dies other server will become primary.
Que : please correct me if I am wrong? provide a better solution

Once upon a time, I remember a when a database table with one million records was considered "big data"...
Before assuming you need to split your dataset across multiple machines I would highly suggest that you first get comfortable with running a single database within a Docker container. Given enough resources MySQL is quite capable of scaling up to 100 million records.
Docker is designed to isolate processes from others running on the same host. This creates challenges for monolithic applications which frequently have a software architecture involving multiple processes communicating to each other over some form of host based IPC (inter process communication). That does not mean they cannot be containerized, but a large multiprocess container looks and operates a lot like a virtual machine, implying that perhaps docker is a less optimal technological fit.
Before I get too negative, it's completely possible to run clustered MySQL using Docker. Couple of examples returned by Google:
http://galeracluster.com/2015/05/getting-started-galera-with-docker-part-1/
http://severalnines.com/blog/how-deploy-galera-cluster-mysql-using-docker-containers
My warning is that you see less examples of running these clusters across multiple Docker hosts, implying the use cases are mostly for demo or test currently.

Related

1gb database with only two records

I identified an issue with an infrastructure I created on the Google Cloud Platform and would like to ask the community for help.
A charge was made that I did not expect, as theoretically it would be almost impossible for me to pass the limits of the free tier. But I noticed that my database is huge, with 1gb and growing, and there are some very heavy buckets.
My database is managed by a Django APP, and accessing the tool's admin panel there are only 2 records in production. There were several backups and things like that, but it wasn't me.
Can anyone give me a guide on how to solve this?
I would assume that you manage the database yourself, i.e. it is not Cloud SQL. Databases pre-allocate files in larger chunks in order to be efficient on writes. You can check this yourself - write additional 100k records, most likely size will not change.
Go to Cloud SQL and it will say how many sql instances you have. If you see the "create instance" window that means that you don't have any Google managed SQL instances and the one we're talking about is a standalone one.
You can also check Deployment Manger to see if you deployed one from the marketplace or it's a standalone VM with MySQL installed manually.
I made a simple experiment and deployed (from Marketplace) a MySQL 5.6 on top of Ubuntu 14.04 LTS. Initial DB size was over 110MB - but there are some records in it initially (schema, permissions etc) so it's not "empty" at all.
It is still weird that your DB is 1GB already. Try to deploy new one (if your use case permits that). Maybe this is some old DB that was used for some purpose and all the content deleted afterwards but some "trash" still remains and takes a lot of space.
Or - it may well be exactly as NeatNerd said - that DB will allocate some space due to performance issues and optimisation.
If I have more details I can give you better explanation.

Kubernetes multiple database instances or HA single instance

I have an Kubernetes environment running multipe applications (services). Now i'm a little bit confused how to setup the MySQL database instance(s).
According to different sources each microservice should have there own database. Should i create a single MySQL statefulset in HA mode running multiple databases OR should i deploy a separate MySQL instance for each application (service) running one database each.
My first thought would be the first option hence where should HA oterwise be usefull for? Would like to hear some differente views on this.
Slightly subjective question, but here's what we have setup. Hopefully, that will help you build a case. I'm sure someone would have a different opinion, and that might be equally valid too:
We deploy about 70 microservices, each with it's own database ("schema"), and it's own JDBC URL (defined via a service). Each microservice has it's own endpoint and credentials that we do not share between microservices. So in effect, we have kept the design to be completely independent across the microservices as far as the schema is concerned.
Deployment-wise, however, we have opted to go with a single database instance for hosting all databases (or "schemas"). While technically, we could deploy each database on its own database instance, we chose not to do it for few main reasons:
Cost overhead: Running separate database instances for each microservice would add a lot of "fixed" costs. This may not be directly relevant to you if you are simply starting the database as a MySQL Docker container (we use a separate database service, such as RDS or Google Cloud SQL). But even in the case of MySQL as a Docker container, you might end up having a non-trivial cost if you run, for example, 70 separate containers one per microservice.
Administration overhead: Given that databases are usually quite involved (disk space, IIOPs, backup/archiving, purge, upgrades and other administration activities), having separate database instances -- or Docker container instances -- may put a significant toll on your admin or operations teams, especially if you have a large number of microservices
Security: Databases are usually also critical when it comes to security as the "truth" usually goes in the DB. Keeping encryption, TLS configuration and strengths of credentials aside (as they should be of utmost importance regardless of your deployment model), security considerations, reviews, audits and logging will bring in significant challenges if your databases instances are too many.
Ease of development: Relatively less critical in the grand scheme of things, but significant, nonetheless. Unless you are thinking of coming up with a different model for development (and thus breaking the "dev-prod parity"), your developers may have a hard time figuring out the database endpoints for debugging even if they only need that information once-in-a-while.
So, my recommendation would be to go with a single database instance (Docker or otherwise), but keep the databases/schemas completely independent and inaccessible by the any microservice but the "owner" microservice.
If you are deploying MySQL as Docker container(s), go with a StatefulSet for persistence. Define an external pvc so that you can always preserve the data, no matter what happens to your pods or even your cluster. Of course, if you run 'active-active', you will need to ensure clustering between your nodes, but we do run it in 'active-passive' mode, so we keep the replica count to 1 given we only use MySQL Docker container alternative for our test environments to save costs of external DBaaS service where it's not required.

Docker - Should my MySQL database live inside or outside the container?

I'm setting up my first production server on docker but I'm not sure where my MSQL database should live. Should the database live outside the container or within? I've read some articles/posts previously that it should live outside so nothing changes if you have to fire up a new container or image, but I'm not sure if this is correct or not. Are there any speed/performance differences with having it inside or outside of the container?
These are some of the responsibilities of our Database Administrators:
Establish and maintain sound backup and recovery policies and procedures
Implement and maintain database security (create and maintain users and roles, assign privileges)
Perform database tuning and performance monitoring
Perform application tuning and performance monitoring
Setup and maintain documentation and standards
Plan growth and changes (capacity planning)
If I need any of these services I use a database outside of the container and hosted by specialists.
If the data needs to be accessed by other applications I use a database on a centralized database server outside of the container and hosted by specialists.
On performance: Docker containers use a virtual network interface by default, see Docker Advanced networking documentation. This comes with just a slight speed overhead. Still, depending on your expected load, you might want to either bind your DB container to the host network or not dockerize your DB at all.
On data persistence: If you are using volumes or volume containers your data lives outside the container and can be mounted by any new container too. No worries here.
On whether to use containers for DBs (my opinion): It is currently en vogue to containerize stateless and interchangeable applications, meaning that you can simply throw away outdated services and replace them by new containers. While this really makes sense for frequently updated microservices… do you really need this for a comparatively long-lived service like databases? Yes, Docker still helps to contain dependencies and ship stuff faster, but there are alternatives like Ansible-provisioned VMs. In the end it depends on what is easiest for your use case.

openshift: is it possible to create multiple mysql cartridges?

Openshift offers scalability. However, it seems to me, if you are using MySql, in the end MySql queries/hits will be the bottleneck (if you are lucky enough to generate traffic which need scalability and considering the max-connections limit on openshift).
Suppose I want to use OpenShift, is it possible to create multiple mysql cartridges to balance the load and create dynamic environmental variables to assign requests to different mySql cartridges? (suppose I send an id or something and the environment variable for the mysql is set to "dbname+lastdigit" of this id").
This is a simplified example which should multiply the database capacity by ten (if this is unrelated data). Can it be done?
I hope some openshift guy or girl will clarify this for me....
cheers
Edit: Thanks mbaird for your info:
To clarify:
I wasn't talking about auto-scaling but using for instance 11 static/persistant db-cartridges which would never scale up or down.
Then you could store user information in any of them depending on (also for instance) their last id-digit.
The 11th database cartride could be used as a table to get the user's id and then redirect that user to the right database (if last digit = 0, db = db0, if last digit = 1, db = db1 etc). This would enable me to call the right database for the right user.
Of course, this is not auto-scaling but it would multiply the database capacity by (roughly) ten.
However, this would require the ability to create multiple mysql cartridges and corresponding environmental variables to gain access to all these mysql cartridges.
It seems to me this not possible right now, so I will investigate your suggestions.
The OpenShift database tier currently doesn't scale. Further, even if you could add a second MySQL cartridge it wouldn't give you a scalable database, it would give you a new, empty database. What you are looking for is the ability to scale the MySQL cartridge across multiple gears, not adding another cartridge.
I've actually seen some comments from OpenShift (although I can't seem to find them now) that the databases on OpenShift are for development only and you should look for another service to host your database if you have a mission-critical application that requires database fail-over and scalability.
Since you are specifically using MySQL, I would look into using Amazon RDS (either MySQL or the new Aurora engine which is MySQL compatible) or ScaleDB.

Mysql cluster for dummies

So what's the idea behind a cluster?
You have multiple machines with the same copy of the DB where you spread the read/write? Is this correct?
How does this idea work? When I make a select query the cluster analyzes which server has less read/writes and points my query to that server?
When you should start using a cluster, I know this is a tricky question, but mabe someone can give me an example like, 1 million visits and a 100 million rows DB.
1) Correct. Every data node does not hold a full copy of the cluster data, but every single bit of data is stored on at least two nodes.
2) Essentially correct. MySQL Cluster supports distributed transactions.
3) When vertical scaling is not possible anymore, and replication becomes impractical :)
As promised, some recommended readings:
Setting Up Multi-Master Circular Replication with MySQL (simple tutorial)
Circular Replication in MySQL (higher-level warnings about conflicts)
MySQL Cluster Multi-Computer How-To (step-by-step tutorial, it assumes multiple physical machines, but you can run your test with all processes running on the same machine by following these instructions)
The MySQL Performance Blog is a reference in this field
1->your 1st point is correct in a way.But i think if multiple machines would share the same data it would be replication instead of clustering.
In clustering the data is divided among the various machines and there is horizontal partitioning means the dividing of the data is based on the rows,the records are divided by using some algorithm among those machines.
the dividing of data is done in such a way that each record will get a unique key just as in case of a key-value pair and each machine also has a unique machine_id related which is used to define which key value pair would go to which machine.
we call each machine a cluster and each cluster consists of an individual mysql-server, individual data and a cluster manager.and also there is a data sharing between all the cluster nodes so that all the data is available to the every node at any time.
the retrieval of data is done through memcached devices/servers for fast retrieval and
there is also a replication server for a particular cluster to save the data.
2->yes, there is a possibility because there is a sharing of all the data among all the cluster nodes. and also you can use a load balancer to balance the load.But the idea of load balancer is quiet common because they are being used by most of the servers. but if you are trying you just for your knowledge then there is no need because you will not get to notice the type of load that creates the requirement of a load balancer the cluster manager itself can do the whole thing.
3->RandomSeed is right. you do feel the need of a cluster when your replication becomes impractical means if you are using the master server for writes and slave for reads then at some time when the traffic becomes huge such that the sever would not be able to work smoothly then you will feel the need of clustering. simply to speed up the whole process.
this is not the only case, this is just one of the scenario this is only just a case.
hope this is helpful for you!!