I'm setting up my first production server on docker but I'm not sure where my MSQL database should live. Should the database live outside the container or within? I've read some articles/posts previously that it should live outside so nothing changes if you have to fire up a new container or image, but I'm not sure if this is correct or not. Are there any speed/performance differences with having it inside or outside of the container?
These are some of the responsibilities of our Database Administrators:
Establish and maintain sound backup and recovery policies and procedures
Implement and maintain database security (create and maintain users and roles, assign privileges)
Perform database tuning and performance monitoring
Perform application tuning and performance monitoring
Setup and maintain documentation and standards
Plan growth and changes (capacity planning)
If I need any of these services I use a database outside of the container and hosted by specialists.
If the data needs to be accessed by other applications I use a database on a centralized database server outside of the container and hosted by specialists.
On performance: Docker containers use a virtual network interface by default, see Docker Advanced networking documentation. This comes with just a slight speed overhead. Still, depending on your expected load, you might want to either bind your DB container to the host network or not dockerize your DB at all.
On data persistence: If you are using volumes or volume containers your data lives outside the container and can be mounted by any new container too. No worries here.
On whether to use containers for DBs (my opinion): It is currently en vogue to containerize stateless and interchangeable applications, meaning that you can simply throw away outdated services and replace them by new containers. While this really makes sense for frequently updated microservices… do you really need this for a comparatively long-lived service like databases? Yes, Docker still helps to contain dependencies and ship stuff faster, but there are alternatives like Ansible-provisioned VMs. In the end it depends on what is easiest for your use case.
Related
I have an Kubernetes environment running multipe applications (services). Now i'm a little bit confused how to setup the MySQL database instance(s).
According to different sources each microservice should have there own database. Should i create a single MySQL statefulset in HA mode running multiple databases OR should i deploy a separate MySQL instance for each application (service) running one database each.
My first thought would be the first option hence where should HA oterwise be usefull for? Would like to hear some differente views on this.
Slightly subjective question, but here's what we have setup. Hopefully, that will help you build a case. I'm sure someone would have a different opinion, and that might be equally valid too:
We deploy about 70 microservices, each with it's own database ("schema"), and it's own JDBC URL (defined via a service). Each microservice has it's own endpoint and credentials that we do not share between microservices. So in effect, we have kept the design to be completely independent across the microservices as far as the schema is concerned.
Deployment-wise, however, we have opted to go with a single database instance for hosting all databases (or "schemas"). While technically, we could deploy each database on its own database instance, we chose not to do it for few main reasons:
Cost overhead: Running separate database instances for each microservice would add a lot of "fixed" costs. This may not be directly relevant to you if you are simply starting the database as a MySQL Docker container (we use a separate database service, such as RDS or Google Cloud SQL). But even in the case of MySQL as a Docker container, you might end up having a non-trivial cost if you run, for example, 70 separate containers one per microservice.
Administration overhead: Given that databases are usually quite involved (disk space, IIOPs, backup/archiving, purge, upgrades and other administration activities), having separate database instances -- or Docker container instances -- may put a significant toll on your admin or operations teams, especially if you have a large number of microservices
Security: Databases are usually also critical when it comes to security as the "truth" usually goes in the DB. Keeping encryption, TLS configuration and strengths of credentials aside (as they should be of utmost importance regardless of your deployment model), security considerations, reviews, audits and logging will bring in significant challenges if your databases instances are too many.
Ease of development: Relatively less critical in the grand scheme of things, but significant, nonetheless. Unless you are thinking of coming up with a different model for development (and thus breaking the "dev-prod parity"), your developers may have a hard time figuring out the database endpoints for debugging even if they only need that information once-in-a-while.
So, my recommendation would be to go with a single database instance (Docker or otherwise), but keep the databases/schemas completely independent and inaccessible by the any microservice but the "owner" microservice.
If you are deploying MySQL as Docker container(s), go with a StatefulSet for persistence. Define an external pvc so that you can always preserve the data, no matter what happens to your pods or even your cluster. Of course, if you run 'active-active', you will need to ensure clustering between your nodes, but we do run it in 'active-passive' mode, so we keep the replica count to 1 given we only use MySQL Docker container alternative for our test environments to save costs of external DBaaS service where it's not required.
I have three different application environments: production, demo, and dev. In each, I have an RDS instance running MySQL. I have five tables that house data that needs to be the same across all environments. I am trying to find a way to handle this.
For security purposes, it's not best to allow demo and dev to access the production database, so putting the data there seems to be a bad idea.
All environments need read/write capabilities. Is there a good solution to this?
Many thanks.
For security purposes, it's not best to allow demo and dev to access the production database, so putting the data there seems to be a bad idea.
Agreed. Do not have your demo/dev environments access data from your production environments.
I don't know your business logic, but I cannot think of a case where dev/demo data needs to be "in sync" with production data, unless the dev/demo environment is also dependent on other "production assets". If that were the case, I would suggest duplicating that data into your other environments.
Usually, the data in your database would be dependent on the environment it's contained within.
For best security and separation of concerns, keep your environment segregated as much as possible. This includes (but not limited to):
database data,
customer data,
images and other files
If data needs to be synchronized, create a script/program to perform that synchronization completely (db + all necessary assets). But do that as part of your normal development pipeline so it goes through dev+testing+qa etc.
So the thing about RDS and database level access is that you still would manage the user credentials like you would on premise. From an AWS perspective all you would need to do to allow access is update the security groups of your Mysql RDS instances to allow the traffic, then give your application the credentials you have provisioned for it. I do agree it is bad practice to give production level access to your dev or demo environments.
As far as the data being the same you can automate a nightly snapshot of the Production database and recreate new instances based on that. If your infrastructure is in Cloudformation or Terraform you can provide the new endpoint created in the snapshot and spin up a new DEV or DEMO environment.
Amazon RDS creates a storage volume snapshot of your DB instance, backing up the entire DB instance and not just individual databases. You can create a DB instance by restoring from this DB snapshot. When you restore the DB instance, you provide the name of the DB snapshot to restore from, and then provide a name for the new DB instance that is created from the restore. You cannot restore from a DB snapshot to an existing DB instance; a new DB instance is created when you restore.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_CreateSnapshot.html
I would recommend using a fan out system at the point of data capture, along with a snapshot.
Take a point in time snap shot (i.e. now), spin up test/dev databases from this, and then use SQS->SNS->SQS fan out architecture to push any new changes to the data to your other databases?
I am working on docker container on linux machine. I have to create database docker container. I have chosen MySQL database. I have three requirement:
load balancing - the database contain a huge table will approx. 100 million record. So we need to share the table across multiple server. To carter this I have chosen MySQL Cluster. I need to distribute the data based on the sharde key. The load balancing will be done by HAProxy.
Que : please correct me if I am wrong? provide a better solution
Persistence - even if the all database container dies, its should able to recover from it For this I have planned to create data-only docker container.
Que : if the data-only docker container dies, will this container able to recover? Is there any change in volume if it comes up?
Availability - Since there will be multiple SQL server with replica feature, even if one server dies other server will become primary.
Que : please correct me if I am wrong? provide a better solution
Once upon a time, I remember a when a database table with one million records was considered "big data"...
Before assuming you need to split your dataset across multiple machines I would highly suggest that you first get comfortable with running a single database within a Docker container. Given enough resources MySQL is quite capable of scaling up to 100 million records.
Docker is designed to isolate processes from others running on the same host. This creates challenges for monolithic applications which frequently have a software architecture involving multiple processes communicating to each other over some form of host based IPC (inter process communication). That does not mean they cannot be containerized, but a large multiprocess container looks and operates a lot like a virtual machine, implying that perhaps docker is a less optimal technological fit.
Before I get too negative, it's completely possible to run clustered MySQL using Docker. Couple of examples returned by Google:
http://galeracluster.com/2015/05/getting-started-galera-with-docker-part-1/
http://severalnines.com/blog/how-deploy-galera-cluster-mysql-using-docker-containers
My warning is that you see less examples of running these clusters across multiple Docker hosts, implying the use cases are mostly for demo or test currently.
I have been making some research in the domain of servers for a website I want to launch. I thought of a certain configuration of a server with RAID 10 implemented with a NAS doing the backup which has a RAID 10 configuration as well. This should keep data safe in 99.99+ of cases.
My problem appeared when I thought about the need of a second server. If I shall ever require more processing power and thus more storage for users, how can I connect a second server to my primary one and make them act as one what the database (mySQL) is regarded?
I mean, I don't want to replicate my first DB on the second server and load-balance the request - I want to use just one DB (maybe external) and let the servers use it both at the same time. Is this possible? And is the option of backing up mySQL data on a NAS viable?
The most common configuration (once scaling up from a single box) is to put the database on its own server. In many web applications, the database is the bottleneck (rather than the web server); so the first hardware scale-up step tends to be to put the DB on its own server.
This also allows you to put additional security between the database and web server - firewalls are common; different user accounts etc. are pretty much standard.
You can then add web servers to the load balancer, all talking to the same database, as long as your database can keep up.
Having more than one web server also helps with resilience - you can have a catastrophic hardware event on one webserver and the load balancer will direct the traffic to the remaining machines.
Scaling the database server performance is a whole different story - though typically you use very beefy machines for the database, and relative lightweights for the web servers.
To add resilience to the database layer, you can introduce clustering - this is a fairly complex thing to keep running, but protects you against catastrophic failure of a single machine.
Yes, you can back up MySQL to a NAS.
My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.