Microservice Database shared with other services - mysql

Something I have searched for but cannot find a straight answer to is this:
For a given service, if there are two instances of that service deployed to two machines, do they share the same persistent store or do they have separate stores with some syncing mechanism (master/slave, clustering)?
E.g. I have a OrderService backed by MySQL. We're getting many orders in so I need to scale this service up, so we deploy a second OrderService. Where does its data come from?
It may sound silly but, to me, every discussion makes it seem like the service and database are a packaged unit that are deployed together. But few discussions mention what happens when you deploy a second service.

Posting this as an answer because it's too long for a comment.
Microservices are self contained components and as such are responsible for their own data. If you want the get to the data you have to talk to the service API. This applies mainly to different kinds of services (i.e. you don't share a database among services that offer different kinds of business functionality - that's bad practice because you couple services at the heap through the database and it's then easy to couple more things that would normally be done at the API level but it's more convenient to do them through the database => you risk loosing componentization).
But if you have the same kind of service then there are, as you mentioned, two obvious choices: share a database or have each service contain it's own database.
Now you have to ask yourself which solution do you chose:
Are these OrderServices of yours truly capable of working on their own, or do you need to have all the orders in the same database for reporting or access by other applications?
determine what is your actual bottleneck. Is it the database? If not then share the database. Is it the services? If not then distribute your data.
need to distribute the data? What are your choices, what are your needs? Do you need to be consistent all the time or eventual consistency is good enough? Do you need to have separate databases and synchronize them manually or does your database installation handle replication and partitioning out of the box?
etc
What I'm trying to say is that in this kind of situations the answer is: it depends. And something that we tech geeks often forget to do before embarking on such distributed/scalability/architecture journeys is to talk to business. Often business can handle a certain degree of inconsistencies, suboptimal processes or looking up data in more places instead of one (i.e. what you think is important might not necessarily be for business). So talk to them and see what they can tolerate. Might be cheaper to resolve something in an operational way than to invest a lot into trying to build a highly distributable system.

Related

Session storage preferences in node.js

I have a node.js application that uses a MySQL database. I wanted to know what would be a good place for storing the sessions?
My application is actually a final project for one of my courses, but it could be a real world application later, as we are re-writing a software that is currently used by the university. I can use MySQL for session store, but I want to make my application using the most reliable or best practice in my situation.
I have read many posts/answers/forums, and the opinion is divided. Using another technology like Memcached/MemcacheDB or Redis, just for session store, would it be a recommended approach? Or should I just stick to MySQL, and later deal with scaling if the server load increases?
Even if the application is later used in real world, it would only be used by the undergraduate university students and faculties, so the users are sort of limited.
As of now, I'm leaning towards MySQL for the session store.
I am replying under the assumption that you are using MySQL throughout the whole application.
If the application will be used in the context of your university possibly it will not have scaling issues. SQL databases are not bad, they are able to handle quite a lot of data efficiently, you just need to be careful in the first place and to create efficient queries. Be careful with the joins because can really kill the server. You need to analyze quite a lot your application. For example, why do you think that you will have scaling/performance issues on the sessions and not in another place of your application? Do a bit of load testing, get some metrics and try to understand if you need it or no.
If you are a student though and you don't have prior experience with redis, I would go with redis because it is good to work with a new technology and gain a bit more of experience :)

Any drawback of building website based on JSON API for Data Access Layer

For instance, in ecommerce websites, we generally have two interfaces. One with which customer interacts and places orders and one with which company employees interact to manage orders and customers etc.
If we divide this website into two different websites. That means, two different projects all together, not dependent on each other. Only thing common between both websites will be the database. Both websites will be using the same database. Then what would be a good option for making Data Access Layer
Each website have its own Database access code and entities.
Link both website with a centralized layer - which exposes Read/Write to database using API based on JSON
In my opinion, second option would be better. As it cancels out dependency of database, any changes made in database need not to be made at two places. And many other benefits.
But my only concern is, how much it could hamper performance of overall system? Because in that case we are serializing and de-serializing objects and also making use of HTTP connections.
Could someone please throw some light over what would be benefits and drawbacks of API backed Data Access Layer in comparison to having own Database access code.
People disagree about the best architecture for this sort of thing, but one common and popular architectural guideline suggest that you avoid integrating two products at the database layer at all costs. It is simpler to have two separate apps and databases which can change independently of each other, and if you need to reference data from one in the other you should have some sort of event pipeline between the two configured on the esb.
And, you should probably have more than two back end databases anyway -- unless you have an incredibly simple system with only the two classes of objects you mentioned, you'll probably find that you have more than two bounded domains.
Also, if your performance requirements increase then you'll probably want to look at splitting the read and write sides of your services and databases, connecting the two sides through an eventing system of some sort, (maybe event-sourcing).
Before you decide what to do you should read Implementing Domain Driven Design by Vaughn Vernon. And, the paper on CQRS by Martin Fowler. And the paper on event sourcing, also from Dr Fowler. For extra points you should also read Fowler on Microservices architecture.
Finally, on JSON -- and I'm a big fan -- but you should only use it at the repository interface if you're either using javascript on the back end (which is a great idea if you're using io.js and Koa) and the front end (backbone & marionette, please), or if you're using a data-source that natively emits json. If you have to parse it then it's only going to slow you down so use some format native to the data-source and its consumers, that way you'll be as fast as possible.
An API centric approach makes more sense as the data is standardised and gives you more flexibility by being usable in any language for one or multiple interfaces.
Performance wise this would greatly depend on the quality and implementation of the technology stack behind the API. You could also look at caching certain data on the frontend to improve page load time.
The guys over at moltin have already built a platform like this and I've had great success using it. There's already a backend dashboard and the response times are pretty fast too!

one big database, or one per client?

I've been asked to develop an application that will be run out to a number of business units. the application will be the basically the same for each unit, but will have minor procedural differences, which won't change the structure of the underlying database. Should I use one database per business unit, or one big database for all the units? The business units are totally separate
My preference is for one database per client. The advantages:
if a client gets too big, they're easy to move - backup, restore, change the connection string, boom. Try doing that when their data is mixed in with others in a massive database. Even if you use schemas and filegroups to segregate, moving them is not a cakewalk.
ditto for deleting a client's data when they move on.
by definition you're keeping each client's data separate. This is often going to be a want, and sometimes a need. Sometimes it will even be legally binding.
all of your code within a database is simpler - it doesn't have to include the client's schema (which can't be parameterized) and your tables don't have to be littered with an extra column indicating the client.
A lot of people will claim that managing 200 or 500 databases is a lot harder than managing 10 databases. It's not really any different, in my experience. You build scripts that automate things, you stagger index maintenance and backup jobs, etc.
The potential disadvantages are when you get up into the realm of 4-digit and higher databases per instance, where you want to start thinking about having multiple servers (the threshold really depends on the workload and the hardware, so I'm just picking a number). If you build the system right, adding a second server and putting new databases there should be quite simple. Again, the app should be aware of each client's connection string, and all you're doing by using different servers is changing the instance the connection string points to.
Some questions over on dba.SE you should look at. They're not all about SQL Server, but many of the concepts and challenges are universal:
https://dba.stackexchange.com/questions/16745/handling-growing-number-of-tenants-in-multi-tenant-database-architecture
https://dba.stackexchange.com/questions/5071/what-are-the-performance-implications-of-running-multiple-smaller-dbs-instead-of
https://dba.stackexchange.com/questions/7924/one-big-database-vs-several-smaller-ones
Your question is a design question. In order to answer it, you need to understand the requirements of the system that you want to build. From a technical perspective, SQL Server -- or really any database -- can handle either scenario.
Here are some things to think about.
The first question is how separate your clients need the data to be. Mixing data together from different business units may not be legal in some cases (say, the investment side of a bank and the market analysis side). In such situations, separate databases are the solution.
The next question is security. In some situations, clients might be very uncomfortable knowing that their data is intermixed with other clients data. A small slip-up, and confidential information is inadvertently shared. This is probably not an issue for different business units in the same company.
Do you have to deal with different uptime requirements, upload requirements, customizations, and perhaps interaction with other tools? If one business unit will need customizations ASAP that other business units are not interested in, then that suggests different databases.
Another consideration is performance. Does this application use a lot of expensive resources? If so, being able to partition the application on different databases -- and potentially different servers -- may be highly desirable.
On the other hand, if much of the data is shared, and the repository is really a central repository with the same underlying functionality, then one database is a good choice.

Distributed db's or not?

INFORMIX-SQL 7.32 (SE) Linux Pawnshop App.
I have some users who own several pawnshops within a 100-mile radius. Each pawnshop app runs with SE. The only functionality these owners need are: ability to remotely login to any store in order to view transactions, running totals and consolidate daily totals at end of business day. This can be accomplished with dialup modems, as the app doesnt have any need for displaying BLOB's. At end-of-day, each stores totals are unloaded to a flat file and transferred to the owner's system.
What would my owners gain by converting to distributed db's?.. ability to find out if a stores customer has conducted business in another store or if another store has a desired inventory item for sale? (not important, seldomly happens). Most customers will usually do business with the same store and if they dont have a desired item for sale, they will visit the closest competitors pawnshop. What gains would distributed db's offer to accomplish the same functionality as described in the first paragraph?.. Pawnshop owners absolutely refuse to connect their production systems via the internet! They dont trust its security, even using VPN, Cisco, etc, or its reliability! In this part of the world, ISP's have a bad track record for uptime. I know of several apps which have converted from web to dialup because of comm problems!
Distributed DBs, more precisely Informix XPS and IDS, don't have just one advantage. If you care just about getting data from different places, you can accomplish it with just a design strategy. If you add a "branch_id", or something like that, you're done.
Distributed DBs have a lot of advantages, from availability to scalability. You must review all these things first.
Sorry for this kind of answer, but is really difficult to give you an straight answer about this topic.
CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent “replica copies” of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes are replicated bi-directionally.
CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication. Most applications require no special planning to take advantage of distributed updates and replication.
Unlike cumbersome attempts to bolt distributed features on top of the same legacy models and databases, it is the result of careful ground-up design, engineering and integration. The document, view, security and replication models, the special purpose query language, the efficient and robust disk layout are all carefully integrated for a reliable and efficient system.
If you are not going to have general 90%+ uptime connection between the databases, then there isn't any benefit to distributed databases.
One main benefit is to give large businesses a 'failover' when one machine goes down or is unavailable. If they have the database distributed over three or four machines, then the loss of one doesn't impact their ability to do business.
A second major benefit is when a database is simply too big for one server to cope with. 'Internet scale' databases (Amazon, Twitter, etc) have that level of traffic. Walmart would have that level of traffic. A couple of storefront operations wouldn't.
I think that this is a context where there is little to gain from distributed database operation.
If you were to go towards distributed operation, I'd probably look towards using a simple ER topology, with the 'head office' store being the primary (root) node and the other shops being leaf nodes. You would then have changes to the individual store databases replicated to the HQ node; you might or might not also propagate the data back to the other stores. Especially with just two stores, you might in fact simply replicate all the information to both stores; this gives you an automatic off-site backup of the database. (You'd probably configure all nodes as root nodes in this case - at least, until a chain grew to, say, five or six nodes.)
This would give you some resiliency for disaster recovery. It would also allow the HQ (in particular) to see what is going on at each store.
My impression is that you are probably not discussing 'transactions per second' on average; the rate of transactions at a single store is probably a few transactions per minute, with 'few' possibly being less than one TPM. Consequently, the network bandwidth is unlikely to be a bottleneck at any point, even with dial-up speeds (though that might be borderline).

What data entry system should I choose for multiple users at multiple sites?

I've just started working on a project that will involve multiple people entering data from multiple geographic locations. I've been asked to prepare forms in Access 2003 to facilitate this data entry. Right now, copies of the DB (with my tables and forms) will be distributed to each of the sites, returned to me, and then I get to hammer them all together. I can do that, but I'm hoping that there is a better way - if not for this project, then for future projects.
We don't have any funding for real programming support, so it's up to me. I am comfortable with HTML, CSS, and SQL, have played around with Django a fair bit, and am a decently fast learner. I don't have much time to design forms, but they don't have to actually function for a few months.
I think there are some substantial benefits to web-based forms (primary keys are set centrally, I can monitor data entry, form changes are immediately and universally deployed, I don't have to do tech support for different versions of Access). But I'd love to hear from voices of experience about the actual benefits and hazards of this stuff.
This is very lightweight data entry - three forms attached to three tables, linked by person ID, certainly under 5000 total records. While this is hardly bank account-type information, I do take the security of these data seriously, so that's an additional consideration. Any specific technology recommendations?
Options that involve Access:
use Jet replication. If the machines where the data editing is being done can be connected via wired LAN to the central network, synchronization would be very easy to implement (via the simple Direct Synchronization, only a couple lines of code). If not (as seems the case), it's an order of magnitude more complex and requires significint setup of the remote systems. For an ongoing project, it can be a very good solution. For a one off, not so much. See the Jet Replication Wiki for lots of information on Jet Replication. One advantage of this solution is that it works completely offline (i.e., no Internet connection).
use Access for the front end and SQL Server (or some other server database) for the back end. Provide a mechanism for remote users to connect to the centrally-hosted database server, either over VPN (preferred) or by exposing a non-standard port to the open Internet (not recommended). For lightweight editing, this shouldn't require overmuch optimization of the Access app to get a usable application, but it isn't going to be as fast as a local connection, and how slow will depend on the users' Internet connections. This solution does require an Internet connection to be used.
host the Access app on a Windows Terminal Server. If the infrastructure is available and there's a budget for CALs (or if the CALs are already in place), this is a very, very easy way to share an Access app. Like #2, this requires an Internet connection, but it puts all the administration in one central location and requires no development beyond what's already been done to create the existing Access app.
For non-Access solutions, it's a matter of building a web front end. For the size app you've outlined, that sounds pretty simple for the person who already knows how to do that, not so much for the person who doesn't!
Even though I'm an Access developer, based on what you've outlined, I'd probably recommend a light-weight web-based front end, as simple as possible with no bells and whistles. I use PHP, but obviously any web scripting environment would be appropriate.
I agree with David: a web-based solution sounds the most suitable.
I use CodeCharge Studio for that: it has a very Access-like interface, lots of wizards to create online forms etc. CCS offers a number of different programming languages; I use PHP, as part of a LAMP stack.