Do views get shared between all servers in a Couchbase cluster? - couchbase

I have a production server and a locally ran server (Couchbase).
I have some views on the development side of the production server I need to sync with my local server.
Do all the views get sync'd across the Cluster? (development views and/or production views?)

Views get synchronised across all nodes in a cluster.
If you have two independent clusters then you'll need to duplicate your design documents on both clusters. This is the case even if you are using XDCR to synchronise the bucket contents between clusters.

Related

Data transfer between local server and AWS server

We have an ASP.NET MVC 5 web application that reads data locally from within the same server. This server is in Europe. However when trying to read the same data from an AWS server based in Sidney the lag is many times greater. A ping from our local server to the AWS server in Australia takes 5 seconds. The data needs to be located in Australia because of data protection laws issued by the Australian Government. The database is MySQL. We have created a VPN between both servers and made no difference.
What are our options in order to improve the speed between these two servers?
If it is a web application serving content to users over internet you can use CloudFront distribution to reduce your latency issues.
https://aws.amazon.com/cloudfront/
If you are trying to connect your servers from your data center to AWS
Use AWS Direct Connect, this will provide a dedicated link between your on-premise datacenter and to the AWS Servers; Decreasing your latency by a lot.
https://aws.amazon.com/directconnect/
AWS runs your application regardless of which platform(ASP.NET, JAVA, C...) it is, AWS only provisions infrastructure. You don't need to worry about the platform on which your application is running and what database it connects to. You just need to ensure that all the network connections are properly open so that your servers can communicate with AWS servers.

Is it recommended to run clustered database with Kubernetes in production environment?

Is it reasonable to use Kubernetes for a clustered database such as MySQL in production environment?
There are example configurations such as mysql galera example. However, most examples do not make use of persistent volumes. As far as I've understood persistent volumes must reside on some shared file system as defined here Kubernetes types of persistent volumes. A shared file system will not guarantee that the database files of the pod will be local to the machine hosting the pod. It will be accessed over network which is rather slow. Moreover, there are issues with MySQL and NFS, for example.
This might be acceptable for a test environment. However, what should I do in a production environment? Is it better to run the database cluster outside Kubernetes and run only application servers with Kubernetes?
The Kubernetes project introduced PetSets, a new pod management abstraction, intended to run stateful applications. It is an alpha feature at present (as of version 1.4) and moving rapidly. A list of the various issues as we move to beta are listed here. Quoting from the section on when to use petsets:
A PetSet ensures that a specified number of "pets" with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In addition to the above, it can be coupled with several other features which help one deploy clustered stateful applications and manage them. Coupled with dynamic volume provisioning for example, it can be used to provision storage automatically.
There are several YAML configuration files available (such as the ones you referenced) using ReplicaSets and Deployments for MySQL and other databases which may be run in production and are probably being run that way as well. However, PetSets are expected to make it a lot easier to run these types of workloads, while supporting upgrades, maintenance, scaling and so on.
You can find some examples of distributed databases with petsets here.
The advantage of provisioning persistent volumes which are networked and non-local (such as GlusterFS) is realized at scale. However, for relatively small clusters, there is a proposal to allow for local storage persistent volumes in the future.

AWS RDS read replicas interaction with application

I am very new to cloud computing. I have never worked with MySQL outside of 1 instance. I am trying to understand how AWS RDS read replicas work with my application. For example say I have 1 master and 2 read replicas. I then from my application server send the query to AWS:
SELECT * FROM users where username = 'bob';
How does this work now? Do I need to include more into my code to choose a certain read replica or does AWS automatically reroute the request or how does it work?
Amazon does not currently provide any sort of load balancing or other traffic distribution across RDS servers. When you send queries to the primary RDS endpoint, 100% of that traffic goes to the primary RDS server. You would have to architect your system to open connections to each server and distribute the queries across the different database servers.
To do this in a way that is transparent to your application, you could setup an HAProxy instance between your application and the database that manages the traffic distribution.
Use of Elastic Load Balancers to distribute RDS traffic is an often requested feature, but Amazon has given no indication that they are working on this feature at this time.

Database synchronization on Azure with different Affinity Groups

From time to time, we have the need to copy/replicate production database into development database. Database being quite big, and servers on different affinity group.
What's the best way to do it without incurring in traffic costs (if at all possible) ? Development and Production ideally should be on different Affinity Group but makes this scenario harder.
If they were on same affinity group, can I connect like they were on the same LAN ?
As you're talking about an affinity group I'm assuming you're using a database server on a Virtual Machine.
Traffic costs only apply when sending data out of the data centre, so irrespective of whether the servers are on the same affinity group or not you will not incur data transfer costs as long as you deploy everything to the same Windows Azure dada centre.
If you deployed all the virtual machines on the same virtual network they will simply see each other on the network.
Another option, which does not require the servers being on the same network (or even on the same account), when using SQL Server, is to export the database to blob storage and import it on the other server.
This would also be the mechanism I would use if you were using Windows Azure SQL Database

Manual deployment vs. Amazon Elastic Beanstalk

What are the advantages we get by using Elastic Beanstalk over maually creating EC2 instance and setting up tomcat server and deploy etc for a typical java web applicaion. Are load balancing, Monitoring and autoscaling the only advantages?
Suppose for my web application which uses database I installed the database in the EC2 instance itself. When Autoscalling takes place will the database gets created in the newly created instance or it will be accessing the database I created in the master instance... If it creates just a replica when autoscaling happens how will be data sync happens between the instances?
All the things you mentioned like load balancing, monitoring and auto-scaling are definitely advantages.
However, you have to kind of think about it this way: In a true Platform as a Service (PAAS), the goal is to separate the application from the platform. As a developer, you only worry about your application. The platform is "rented" to you. The platform "instances" are automatically updated, administered, scaled, balanced, etc. for you. You just upload your WAR file and it just works (at least theoretically).
EC2 by itself is not PAAS. It is more like IAAS (Infrastructure as a Service). You still have to take care of the server instances, install software on them, keep them updated, etc.
Elastic Beanstalk is a PAAS system. So are App Engine and Azure among many others.
In a true PAAS system, the DBMS is a separate component from the web application server(s). The reason is obvious: The DBMS cannot be possibly installed on the instances that are being used for the application server because, as instances are created and destroyed based on your traffic, the DBMS would be lost! Having the DBMS and application server on the same machine/instance is not generally a good idea anyway.
In a PAAS system, the DBMS is a separate service. For Amazon, it would be Amazon RDS. Just like with Elastic Beanstalk, where you don't have to worry about the application server and you just upload your WAR file, with RDS, you don't have to worry about the DBMS and you just deploy your database(s).
Elastic Beanstalk and RDS work very well together, especially when deployed in the same availability zone, where the latency would be very low.
Finally, using Elastic Beanstalk doesn't cost anything more than the deployed resources (EC2 instances and the load balancer). However, RDS is not cheap and would definitely be more expensive than using a single EC2 instance for both the application server and the DBMS.
Elastic Beanstalk does more than just load balancing, monitoring, and autoscaling.
1) Manages application versions by storing and managing different versions of your application, allowing you to easily switch back and forth between different versions of your applications.
2) Has the concept of "environments" for each application, allowing you to deploy different versions of your application in each environment. This is handy for example if you want to set up separate QA and DEV environments, and you want to easily deploy a build first in DEV then deploy the same version of the application in QA when your QA team is ready for the next build.
3) Externalizes the important container configuration properties (Tomcat memory settings, for example) to the Elastic Beanstalk console and API. Because of this you can easily save the settings and copy them between environments.
4) View application log files through the console and automatically roll and archive log files to S3. (Admittedly this feature is currently a little weak.)
I had an app deployed both in EC2 dedicated(Nginx & Gunicorn) and Beanstalk Environment(CentOS & Apache2).
My observations:
BeanStalk is Paas. Manually creating an EC2 instance(IAAS), is like doing everything from scratch, but you have solid control.
BeanStalk comes with by default CentOS and Apache(Httpd). You could choose OS in dedicated instance.
These things that mattered to me,
There were lots of 504 errors showing up in Beanstalk environment.
It was difficult to debug when BeanStalk server crashed, as logs would also not show up and could not ssh into machine. This is very important.
Installing/configuring tools like Celery, Redis (need to run another port) etc.,. in dedicated instance is lot more easier.
In my case, I had to scale up (Beanstalk)server in order to run installation of some packages(like pandoc). These things are more simpler in Ubuntu.
Scaling is a lot more easier in BeanStalk. Cloning servers is straightforward in BeanStalk.
I had taken micro in both the cases (dedicated & Beanstalk). I felt dedicated micro instance was better.
Automated deployment in Beanstalk. I had to write scripts to automate the same, which is fine, since it is only once.