I have an application written in Dropwizard and using hibernate to connect to DB(mysql). Due to new features being released, I am expecting high load for read apis and thinking of providing the reads from slave DB. What are the different ways in which i can configure master-slave and tradeoffs.
The way I have solved:
I have 2 session factories in my case: one is default which talks to master and other one with a name say "slaveDb" which talk to slave database.
I have created different dao for same entity one for slave interactions and one for master. in slave dao i am binding it with the slaveSessionFactory
Now unit of work annotation has one attribute "value" if you don't use it , which we do not in many cases then the annotation processor will talk on top of the default session factory. If you define a name over here then annotation processor will use the session factory with that particular name.
P.S. In my case I have a single slave as the application is not that high load and I wanted slave just for report generation purpose. In case of many slaves this solution doesn't scale well. Also As I was giving the slave machine details in my config.yaml file , I need not set the underlying connection as read only.
If you are using #UnitOfWork Annotation , then no and yes.
No, they donot directly allow you to communicate db using read only.
Yes, you can create two resources each using different db (master slave).
One resource for writes and critically reads(master) another for read only(slave).
https://groups.google.com/forum/#!topic/dropwizard-user/nxURxVWDtEY
Also as link suggest mysql driver can do this automatically, but for that session readOnly should be true which UnitOfWorkApplicationListener doesnot set properly even if you set readOnly true in #UnitOfWork.
Related
We using Spring, Spring-Data and JPA in our project.
For production servers, we would like to setup database cluster such that all read queries are directed to one server and all write queries are directed to another server.
This obviously will require some changes in the way the DAOs are built.
Does anyone know how to achieve this if one has, so far, been following cook-book style DAO creations using Spring-Data/JPA where a DAO implementation is responsible for both reads and writes? What kind of changes in architecture will be needed to segregate the two types of calls?
When using MySQL, it is common for Java developers to use Connector/J as the JDBC driver. Developers typically use the Connector/J com.mysql.jdbc.Driver class, with a URL such as jdbc:mysql://host[:port]/database to connect to MySQL databases.
Connector/J offers another driver called ReplicationDriver that allows an application to load-balance between multiple MySQL hosts. When using ReplicationDriver, the JDBC URL changes to jdbc:mysql:replication://master-host[:master-port][,slave-1-host[:slave-1-port]][,slave-2-host[:slave-2-port]]/database. This allows the application to connect to one of multiple servers depending on which one is available at any given point in time.
When using the ReplicationDriver, if a JDBC connection is set to read-only, the driver treats the first host declared in the URL as a read-write host and all others as read-only hosts. Developers can take advantage of this in a Spring application by structuring their code as follows:
#Service
#Transactional(readOnly = true)
public class SomeServiceImpl implements SomeService {
public SomeDataType readSomething(...) { ... }
#Transactional(readOnly = false)
public void writeSomething(...) { ... }
}
With code like this, whenever the method readSomething is called, the Spring transaction management code will obtain a JDBC Connection and call setReadOnly(true) on it because the service methods are annotated with #Transactional(readOnly = true) by default. This will make all database queries from the readSomething method to go to one of the non-master MySQL hosts, load-balanced in a round-robin fashion. Similarly, whenever writeSomething is called, Spring will call setReadOnly(false) on the underlying JDBC Connection, forcing the database queries to go to the master server.
This strategy allows the application to direct all read-only traffic to one set of MySQL servers and all read-write traffic to a different server, without changing the application's logical architecture or the developers having to worry about different database hosts and roles.
Well, what your are talking about is actually called CQRS (http://martinfowler.com/bliki/CQRS.html).
I would suggest reading some of the concept guidelines before making attempts of implementing it.
As for your question, for short first victory, I would suggest by starting dividing the DAL's services into Finder classes and Repository classes which will be used by a higher, business oriented services.
Finders will suit a read-only access, exposing only getBy...() methods and lookups that return custom result objects, such as reports, and their underlying implementation is tailored to work against the read-only database.
Repositories on the other hand, will suit a write-only / getById() methods and their underlying implementation is tailored to work against the write-only database.
The only thing left is the synchronisation between those databases.
This can be achieved quite simply by technical solutions such as: database replication, postponed updates to the read-only database after changes were made to the write-only database (eventual consistency).
I have a following setup:
Several data processing workers get configuration from django view get_conf() by http.
Configuration is stored in django model using MySQL / InnoDB backend
Configuration model has overridden save() method which tells workers to reload configuration
I have noticed that sometimes the workers do not receive the changed configuration correctly. In particular, when the conf reload time was shorter than usual, the workers got "old" configuration from get_conf() (missing the most recent change). The transaction model used in Django is the default autocommit.
I have come up with the following possible scenario that could cause the behavior:
New configuration is saved
save() returns but MySQL / InnoDB is still processing the (auto)commit
Workers are booted and make http request for new configuration
MySQL (auto)commit finishes
Is the step 2 in the above scenario possible? That is, can django model save() return before the data is actually committed in the DB if the autocommit transactional method is being used? Or, to go one layer down, can MySQL autocommitting INSERT or UPDATE operation finish before the commit is complete (update / insert visible to other transactions)?
Object may be getting dirty, please try refresh object after save.
obj.save()
obj.refresh_from_db()
reference: https://docs.djangoproject.com/en/1.8/ref/models/instances/#refreshing-objects-from-database
This definitely looks like a race condition.
The scenario you describe should never happen if there's only one script and one database. When you save(), the method doesn't return until the data is actually commited to the database.
If however you're using a master/slave configuration, you could be the victim of the replication delay: if you write on the master but read on the slaves, then it is entirely possible that your script doesn't wait long enough for the replication to occur, and you read the old conf from the slave before it had the opportunity to replicate the master.
Such a configuration can be set up in django using database routers, or it can be done on the DB side by using a DB proxy. Check that out.
I'm actually using SQLAlchemy with MySQL and Pyro to make a server program. Many clients connect to this server to make requests. The programs only provides the information from the database MySQL and sometimes make some calculations.
Is it better to create a session for each client or to use the same session for every clients?
What you want is a scoped_session.
The benefits are (compared to a single shared session between clients):
No locking needed
Transactions supported
Connection pool to database (implicit done by SQLAlchemy)
How to use it
You just create the scoped_session:
Session = scoped_session(some_factory)
and access it in your Pyro methods:
class MyPyroObject():
def remote_method(self):
Session.query(MyModel).filter...
Behind the scenes
The code above guarantees that the Session is created and closed as needed. The session object is created as soon as you access it the first time in a thread and will be removed/closed after the thread is finished (ref). As each Pyro client connection has its own thread on the default setting (don't change it!), you will have one session per client.
The best I can try is to create new Session in every client's request. I hope there is no penalty in the performance.
hoping someone may tell me if there is a way to provide automatic redundancy using JPA. We're currently using EclipseLink but can change should another provider have a suitable solution) and we need to ensure that we switch to our backup database should our primary database become unavailable (since its not located in the same building as our application). thanks for your input.
The easiest way is to change the jdbc connection url as explained in the mysql documentation. As an example
jdbc:mysql://master.server.com:3306,backup.server.com:3306/dbname
In this scenario, if maser.server.com fails, the driver will redirect the commands to backup.server.com. I strongly suggest you to read the whole documentation, as there are a lot of properties which change the failover behaviour, in particular the section High Availability and Clustering.
am I correct assuming that if a different process updates the DB then my NHibernate powered application will be out-of-sync? I'm almost using non-lazy update.
My target DB is mysql 5.0, if it makes any difference.
There isn't a simple way to answer that without more context.
What type of application are you thinking about (web, desktop, other)?
What do you think would be out of sync exactly?
If you have a desktop application with an open window with an open session that has data loaded and you change the same entities somewhere else, of course the DB will be out of sync, but you can use Refresh to update those entities.
If you use NH second-level caching and you modify the cached entities somewhere else, the cache contents will be out of sync, but you can still use Refresh or cache-controlling methods to update directly from the DB.
In all cases, NH provides support for optimistic concurrency by using Version properties; those prevent modifications to out-of-sync entities.
Yes, the objects in your current session will be out of sync, the same way a DataSet/DataTable would be out of sync if you fetch it and another process updates the same data.