Segregating the read-only and read-write in Spring/J2EE Apps - mysql

We using Spring, Spring-Data and JPA in our project.
For production servers, we would like to setup database cluster such that all read queries are directed to one server and all write queries are directed to another server.
This obviously will require some changes in the way the DAOs are built.
Does anyone know how to achieve this if one has, so far, been following cook-book style DAO creations using Spring-Data/JPA where a DAO implementation is responsible for both reads and writes? What kind of changes in architecture will be needed to segregate the two types of calls?

When using MySQL, it is common for Java developers to use Connector/J as the JDBC driver. Developers typically use the Connector/J com.mysql.jdbc.Driver class, with a URL such as jdbc:mysql://host[:port]/database to connect to MySQL databases.
Connector/J offers another driver called ReplicationDriver that allows an application to load-balance between multiple MySQL hosts. When using ReplicationDriver, the JDBC URL changes to jdbc:mysql:replication://master-host[:master-port][,slave-1-host[:slave-1-port]][,slave-2-host[:slave-2-port]]/database. This allows the application to connect to one of multiple servers depending on which one is available at any given point in time.
When using the ReplicationDriver, if a JDBC connection is set to read-only, the driver treats the first host declared in the URL as a read-write host and all others as read-only hosts. Developers can take advantage of this in a Spring application by structuring their code as follows:
#Service
#Transactional(readOnly = true)
public class SomeServiceImpl implements SomeService {
public SomeDataType readSomething(...) { ... }
#Transactional(readOnly = false)
public void writeSomething(...) { ... }
}
With code like this, whenever the method readSomething is called, the Spring transaction management code will obtain a JDBC Connection and call setReadOnly(true) on it because the service methods are annotated with #Transactional(readOnly = true) by default. This will make all database queries from the readSomething method to go to one of the non-master MySQL hosts, load-balanced in a round-robin fashion. Similarly, whenever writeSomething is called, Spring will call setReadOnly(false) on the underlying JDBC Connection, forcing the database queries to go to the master server.
This strategy allows the application to direct all read-only traffic to one set of MySQL servers and all read-write traffic to a different server, without changing the application's logical architecture or the developers having to worry about different database hosts and roles.

Well, what your are talking about is actually called CQRS (http://martinfowler.com/bliki/CQRS.html).
I would suggest reading some of the concept guidelines before making attempts of implementing it.
As for your question, for short first victory, I would suggest by starting dividing the DAL's services into Finder classes and Repository classes which will be used by a higher, business oriented services.
Finders will suit a read-only access, exposing only getBy...() methods and lookups that return custom result objects, such as reports, and their underlying implementation is tailored to work against the read-only database.
Repositories on the other hand, will suit a write-only / getById() methods and their underlying implementation is tailored to work against the write-only database.
The only thing left is the synchronisation between those databases.
This can be achieved quite simply by technical solutions such as: database replication, postponed updates to the read-only database after changes were made to the write-only database (eventual consistency).

Related

How to connect to an arbitary database using FaaS?

I just did some reading about serverless computing and FaaS. If using FaaS to access an arbitrary database, we need each time to establish and close a database connection. In, lets say a node applications, we would usually establish the connection once and reuse it for multiple requests.
Correct?
I have a hosted MongoDB at mlab and thought about implementing a REST API with Googles Cloud Functions Service. Don't know how to handle the database connection efficient.
For sure thing get clearer while coding and testing. But I would like to know chances to succeed before spending a lot of time.
Thanks
Stefan
Serverless platforms reuse the underlying containers between distinct function invocations whenever possible. Hence you can set up a database connection pool in the global function scope and reuse it for subsequent invocations - as long as the container stays warm. GCP has a guide here using MySQL but I imagine the same applies to MongoDB.

Microservices centralized database model

Currently we have some microservice, they have their own database model and migration what provided by GORM Golang package. We have a big old MySQL database which is against the microservices laws, but we can't replace it. Im afraid when the microservices numbers start to growing, we will be lost in the many database model. When I add a new column in a microservice I just type service migrate to the terminal (because there is a cli for run and migrate commands), and it is refresh the database.
What is the best practice to manage it. For example I have 1000 microservice, noone will type the service migrate when someone refresh the models. I thinking about a centralized database service, where we just add a new column and it will store all the models with all migration. The only problem, how will the services get to know about database model changes. This is how we store for example a user in a service:
type User struct {
ID uint `gorm:"column:id;not null" sql:"AUTO_INCREMENT"`
Name string `gorm:"column:name;not null" sql:"type:varchar(100)"`
Username sql.NullString `gorm:"column:username;not null" sql:"type:varchar(255)"`
}
func (u *User) TableName() string {
return "users"
}
Depending on your use cases, MySQL Cluster might be an option. Two phase commits used by MySQL Cluster make frequent writes impractical, but if write performance isn't a big issue then I would expect MySQL Cluster would work out better than connection pooling or queuing hacks. Certainly worth considering.
If I'm understanding your question correctly, you're trying to still use one MySQL instance but with many microservices.
There are a couple of ways to make an SQL system work:
You could create a microservice-type that handles data inserts/reads from the database and take advantage of connection pooling. And have the rest of your services do all their data read/writes through these services. This will definitely add a bit of extra latency to all your writes/reads and likely be problematic at scale.
You could attempt to look for a multi-master SQL solution (e.g. CitusDB) that scales easily and you can use a central schema for your database and just make sure to handle edge cases for data insertion (de-deuping etc.)
You can use data-streaming architectures like Kafka or AWS Kinesis to transfer your data to your microservices and make sure they only deal with data through these streams. This way, you can de-couple your database from your data.
The best way to approach it in my opinion is #3. This way, you won't have to think about your storage at the computation layer of your microservice architecture.
Not sure what service you're using for your microservices, but StdLib forces a few conversions (e.g. around only transferring data through HTTP) that helps folks wrap their head around it all. AWS Lambda also works very well with Kinesis as a source to launch the function which could help with the #3 approach.
Disclaimer: I'm the founder of StdLib.
If I understand your question correctly it seems to me that there may be multiple ways to achieve this.
One solution is to have a schema version somewhere in the database that your microservices periodically check. When your database schema changes you can increase the schema version. As a result of this if a service notices that the database schema version is higher than the current schema version of the service it can migrate the schema in the code which gorm allows.
Other options could depend on how you run your microservices. For example if you run them using some orchestration platform (e.g. Kubernetes) you could put the migration code somewhere to run when your service initializes. Then once you update the schema you can force a rolling refresh of your containers which would in turn trigger the migration.

master-slave in dropwizard and hibernate

I have an application written in Dropwizard and using hibernate to connect to DB(mysql). Due to new features being released, I am expecting high load for read apis and thinking of providing the reads from slave DB. What are the different ways in which i can configure master-slave and tradeoffs.
The way I have solved:
I have 2 session factories in my case: one is default which talks to master and other one with a name say "slaveDb" which talk to slave database.
I have created different dao for same entity one for slave interactions and one for master. in slave dao i am binding it with the slaveSessionFactory
Now unit of work annotation has one attribute "value" if you don't use it , which we do not in many cases then the annotation processor will talk on top of the default session factory. If you define a name over here then annotation processor will use the session factory with that particular name.
P.S. In my case I have a single slave as the application is not that high load and I wanted slave just for report generation purpose. In case of many slaves this solution doesn't scale well. Also As I was giving the slave machine details in my config.yaml file , I need not set the underlying connection as read only.
If you are using #UnitOfWork Annotation , then no and yes.
No, they donot directly allow you to communicate db using read only.
Yes, you can create two resources each using different db (master slave).
One resource for writes and critically reads(master) another for read only(slave).
https://groups.google.com/forum/#!topic/dropwizard-user/nxURxVWDtEY
Also as link suggest mysql driver can do this automatically, but for that session readOnly should be true which UnitOfWorkApplicationListener doesnot set properly even if you set readOnly true in #UnitOfWork.

Creating a .NET MVC web app with a Mirrored Database for HA

I am writing my first .NET MVC application and I am using the Code-First approach. I have recently learned how to configure two SQL Servers installations for High Availability using a Mirror Database and a Witness (not to be confused with Failover Clusters) to do the failover process. I think this would be a great time to practice both things by mounting my web app into a highly-available DB.
Now, for what I learned (correct me if I'm wrong) in the mirror configuration you have the witness failover to the secondary DB if the first one goes down... but your application will also need to change the connection string to reference the secondary server.
What is the best approach to have both addresses in the Web.config (or somewhere else) and choosing the right connection string?
I have zero experience with connecting to Mirrored databases, so this is all heresy! :)
The short of it may be you may not have to do anything special, as long as you pass along the FailoverPartner attribute in your connection string. The long of it is you may need additional error handling to attempt a new connection so the data provide will actually use the FailoverPartner name in the new connection.
There seems to be some good information with Connecting Clients to a Database Mirroring Session to get started. Have you had a chance to check that out?
If not, its there with Making the Initial Connection where they introduce the FailoverPartner attribute of the ConnectionString property attributes.
Reconnecting to a Database Mirroring Session suggests that on any client disconnect due to failover, the client will need to trap this exception and be prepared to reconnect:
The application must become aware of
the error. Then, the application needs
to close the failed connection and
open a new connection using the same
connection string attributes.
If the FailoverPartner attribute is available, this process should be relatively transparent to the client.
If the above doesn't work, then you might need to actually introduce some logic at the application tier to track who is the primary node, the failover node, and connection strings for each, and be prepared to persist that information somewhere - much like the data access provider should be doing for us (eyes wide open).
There is also this ServerFault post on database mirroring with Sql Server that might be of interest from an operational viewpoint that has additional reference information.
Hopefully someone with actual experience will back up any of this!
This may be totally off base, but what if you had a load balancer between your web server and the database servers?
The Load Balancer would have both databases in it's pool, using basic health check techniques (e.g ping, etc).
Your configuration would then only need to point to the IP of the Load Balancer, and wouldn't need to change.
This is what these network devices are good for. It's not the job of the programming framework (ASP.NET) to make decisions on the health of servers.

Partial database synchronization between secure and unsecure site?

I've been working with a couple different contact management solutions, but my security paranoia prevents me from wanting to put any sensitive data in the cloud. While I'm not dealing with anything like credit transactions that have detailed security requirements, I am considering handling sensitive personal data that I wouldn't want unauthorized people to have access to.
I'm considering a model where there are two systems: one on a Firewalled LAN that contains the sensitive data, which can only be accessed by those on the lan. A second system would contain the non-sensitive (phone book) information subset of the secure system. I would like changes to the non-sensitive data to be synchronized by a connection initiated by the secure system, so as not to compromise the firewall.
I understand that master-slave replication could probably provide synchronization from the secure system to the public system, however this would seem to overwrite any changes made to the public database.
Now to my questions:
1) Does this security model make sense? Are there any downsides or inherent problems with this approach that I might be overlooking?
2) Is there an off-the-shelf or some example of how to implement this type of system?
3) If no, how would you go about implementing this? (language suggestions, synch algorithms, etc)
Assume a LAMPP configuration, with no persistent processes allowed (except for Apache & MySQL) on the public system.
1) This sounds like it'll end up being a lot of pain. Are you sure you want the public system being able to push changes to the private one?
2) Mysql replication with the public system as the master should do the trick. You'll want to use the --replicate-do-* options to restrict what statements the slave will execute. To avoid breaking replication, you'll need to ensure that any SQL affecting tables you care about doesn't reference tables the slave doesn't know about.