Microservice Architecture design - mysql

I have few doubts on Microservice architecture.
Lets say there are microservices A, B and C.
A maintains the context of a job apart from other things it does and B,C work to fulfill that job by doing respective tasks for that job.
Here I have questions.
1. DB design
I am talking about SQL here.
Usage of foreign keys simplifies lot of things.
But as I understand microservice architecture, every microservice maintaines its own data and data has to be queried from that service if required.
Does it mean no foreign keys referring to tables in another microservices?
2. Data Flow
As I see here are two ways.
All the queries are done using jobId maintained uniquely in all microservices for a job.
Client requests go directly to individual service for a task. To get summary of the job, client queries individual microservices collects the data and passes to user.
Do everything through coordinating microservice. Client requests go to service A and in tern service A will gather info from all other microservices for that jobId and passed that to user.
Which of the above two has to be followed and why?

You're correct thinking that microservices should ideally have their own data structure so they can be deployed independently. However there are several design patterns that help you, and that doesn't necessarily translates in "No FK". Please refer to:
Database per service
Sagas
API Composition
CQRS
The patterns listed above answer both your questions.

Does it mean no foreign keys referring to tables in another microservices?
Not in the database sense. One microservice may hold IDs of remote entities but should not assume anything about the remote microservice persistence (i.e. the database type, it could be anything from SQL to NoSQL).
Which of the above two has to be followed and why?
This really depends. There are two types of architectures: choreography and orchestration. Both of them are good. Which one to use? Only you can decide. Here are a few blog posts about them:
Microservices — When to React Vs. Orchestrate
Benefits of Microservices - Choreography over Orchestration, Low Coupling and High Cohesion
Also, the solution to this SO question might be useful.

Related

Migrating previously collected datasets to FIWARE backend

Having at hand, the task of migrating previously collected environmental datasets (weather, airquality, noise etc) from sensors deployed in different locations, and stored in several tables of MySQL database, to my instance of fiware Orion CB, and thus persisted to fiware backend.
The challenges are many:
the data isn't stored in fiware standards, so must be transformed according to the fiware data models.
not all tables are a good candidates of being transformed to an Entity.
some Entities need have field values from several tables as attributes. For instance, defining AirQualityObserved Entity-type would have attributes from these tables: airquality, co, co2, no2 and deployment. So mapping these attributes to a particular Entity-type is a challenge.
As this is a one-time upload (not live data), I am thinking of two possibilities to go about it.
Add an LwM2M client, to keep sending data to an IoTAgent and eventually passed to Orion CB until the last record.
Create a Python script that "pretends" to be a contextProvider to the Orion instance, sending data (say every 5sec) until the last record.
I have not come across a case in my literature search that addresses such a situation. Is there any recommendations from FIWARE Foundation for situations similar to this?
How would you suggest about data fields --> Entity's attributes mapping that actually need be combined from several tables?
IOTA usage makes sense when you have live data (I mean, a real device sending information to the FIWARE platform). However, you say this is a one-time upload, so the Python script option seems better this case.
(A little terminological comment here: your script will take the role of context producer. A context provider is a different actor, related with registrations and query/update forwarding. See this piece of documentation for additional detail).
With regards to the data fields to Entity's attributes mapping I don't have any particular suggestion. This is just a matter of analyzing the data model (i.e. entity attributes) and find how to set that information from your data in the tables.

Design database schema to support multi-tenant in MYSQL

I'm working on a School manager software in ASP that connects to an MYSQL DB. The software is working great when I deploy it in local machine for each user (SCHOOL), but I want to migrate software to AZURE cloud. The users will have an account to connect to the same app but data must not mix with other schools data. My problem is to find the best way to deploy and manage the database.
Must I Deploy 1 DB for each school
All school DATA in the same DB.
I'm not sure my solutions are the best ways.
I don't want ex STUDENT TABLE( content student for school X, for SCHOOL Y, ...)
please help to find the best solution.
There are multiple possible ways to design schema to support multi-tenant. The simplicity of the design depends on the use case.
Separate the data of every tenant (school) physically, i.e., one
schema must contain data related to only a specific tenant.
Pros:
Easy for A/B Testing. You can release updates which require database changes to some tenants and over time make it available for others.
Easy to move the database from one data-center to another. Support different SLA for backup for different customers.
Per tenant database level customization is easy. Adding a new table for customers, or modifying/adding a field becomes easy.
Third party integrations are relatively easy, e.g., connecting your data with Google Data Studio.
Scaling is relatively easy.
Retrieving data from one tenant is easy without worrying about the mixing up foreign key values.
Cons:
When you have to modify any field/table, then your application code needs to handle cases where the alterations are not completed in some databases.
Retrieving analytics across customers becomes difficult. Designing Queries for usage analysis becomes harder.
When integrating with other databases system, especially NoSQL, you will need more resources. e.g., indexing data in Elasticsearch for every tenant will require index per tenant, and if there are thousands of customers, it will result in creating thousands of shards.
Common data across tenants needs to be copied in every database
Separate data for every tenant (school) logically, i.e., one schema
contains data for all the tenants.
Pros:
Software releases are simple.
Easy to query usage analytics across multiple tenants.
Cons:
Scaling is relatively tricky. May need database sharding.
Maintaining the logical isolation of data for every tenant in all the tables requires more attention and may cause data corruption if not handled at the application level carefully.
Designing database systems for the application that support multiple regions is complicated.
Retrieving data from a single tenant is difficult. (Remember: all the records will be associated with some other records using foreign keys.)
This is not a comprehensive list. These are based on my experiences with working on both the type of designs. Both the designs are common and are used by multiple organization based on the usecase.

Microservice Database shared with other services

Something I have searched for but cannot find a straight answer to is this:
For a given service, if there are two instances of that service deployed to two machines, do they share the same persistent store or do they have separate stores with some syncing mechanism (master/slave, clustering)?
E.g. I have a OrderService backed by MySQL. We're getting many orders in so I need to scale this service up, so we deploy a second OrderService. Where does its data come from?
It may sound silly but, to me, every discussion makes it seem like the service and database are a packaged unit that are deployed together. But few discussions mention what happens when you deploy a second service.
Posting this as an answer because it's too long for a comment.
Microservices are self contained components and as such are responsible for their own data. If you want the get to the data you have to talk to the service API. This applies mainly to different kinds of services (i.e. you don't share a database among services that offer different kinds of business functionality - that's bad practice because you couple services at the heap through the database and it's then easy to couple more things that would normally be done at the API level but it's more convenient to do them through the database => you risk loosing componentization).
But if you have the same kind of service then there are, as you mentioned, two obvious choices: share a database or have each service contain it's own database.
Now you have to ask yourself which solution do you chose:
Are these OrderServices of yours truly capable of working on their own, or do you need to have all the orders in the same database for reporting or access by other applications?
determine what is your actual bottleneck. Is it the database? If not then share the database. Is it the services? If not then distribute your data.
need to distribute the data? What are your choices, what are your needs? Do you need to be consistent all the time or eventual consistency is good enough? Do you need to have separate databases and synchronize them manually or does your database installation handle replication and partitioning out of the box?
etc
What I'm trying to say is that in this kind of situations the answer is: it depends. And something that we tech geeks often forget to do before embarking on such distributed/scalability/architecture journeys is to talk to business. Often business can handle a certain degree of inconsistencies, suboptimal processes or looking up data in more places instead of one (i.e. what you think is important might not necessarily be for business). So talk to them and see what they can tolerate. Might be cheaper to resolve something in an operational way than to invest a lot into trying to build a highly distributable system.

Any drawback of building website based on JSON API for Data Access Layer

For instance, in ecommerce websites, we generally have two interfaces. One with which customer interacts and places orders and one with which company employees interact to manage orders and customers etc.
If we divide this website into two different websites. That means, two different projects all together, not dependent on each other. Only thing common between both websites will be the database. Both websites will be using the same database. Then what would be a good option for making Data Access Layer
Each website have its own Database access code and entities.
Link both website with a centralized layer - which exposes Read/Write to database using API based on JSON
In my opinion, second option would be better. As it cancels out dependency of database, any changes made in database need not to be made at two places. And many other benefits.
But my only concern is, how much it could hamper performance of overall system? Because in that case we are serializing and de-serializing objects and also making use of HTTP connections.
Could someone please throw some light over what would be benefits and drawbacks of API backed Data Access Layer in comparison to having own Database access code.
People disagree about the best architecture for this sort of thing, but one common and popular architectural guideline suggest that you avoid integrating two products at the database layer at all costs. It is simpler to have two separate apps and databases which can change independently of each other, and if you need to reference data from one in the other you should have some sort of event pipeline between the two configured on the esb.
And, you should probably have more than two back end databases anyway -- unless you have an incredibly simple system with only the two classes of objects you mentioned, you'll probably find that you have more than two bounded domains.
Also, if your performance requirements increase then you'll probably want to look at splitting the read and write sides of your services and databases, connecting the two sides through an eventing system of some sort, (maybe event-sourcing).
Before you decide what to do you should read Implementing Domain Driven Design by Vaughn Vernon. And, the paper on CQRS by Martin Fowler. And the paper on event sourcing, also from Dr Fowler. For extra points you should also read Fowler on Microservices architecture.
Finally, on JSON -- and I'm a big fan -- but you should only use it at the repository interface if you're either using javascript on the back end (which is a great idea if you're using io.js and Koa) and the front end (backbone & marionette, please), or if you're using a data-source that natively emits json. If you have to parse it then it's only going to slow you down so use some format native to the data-source and its consumers, that way you'll be as fast as possible.
An API centric approach makes more sense as the data is standardised and gives you more flexibility by being usable in any language for one or multiple interfaces.
Performance wise this would greatly depend on the quality and implementation of the technology stack behind the API. You could also look at caching certain data on the frontend to improve page load time.
The guys over at moltin have already built a platform like this and I've had great success using it. There's already a backend dashboard and the response times are pretty fast too!

notifying applications on db INSERT

Consider an application with two components, possibly running on separate machines:
Producer - Inserts records into a
database, but does little to no reading from the database.
Multiple instances may be running
concurrently.
Consumer - Must be notified when a record is inserted into the
database by an instance of component
A. May also have multiple instances.
What is the best way to perform the notifications, assuming that producers will be inserting 10-100 records into the database per second at peak times? The database technology is currently MySQL, but this is not necessarily set in stone. I can see a few different ways:
Use something like MySQL message queue to "push" INSERT notifications to subscribers (consumers). Producers would have no knowledge that this was occurring.
Have producers interact with an intermediate layer that performs the INSERT, and pushes notifications to a message queue that consumers are subscribed to.
Have consumers poll the database frequently to check for new additions (seems like a bad idea)
etc.
As far as coupling is concerned: Is it a good idea to have a two relatively separate application components perform direct queries on a shared database, or should one component "own" the database while the other component indirectly interacts with the DB via calls to the owning component?
I like the second proposed solution (the intermediate layer), as it separates the notification from the database work, and could possibly be part of a two-phase commit XA transaction. If the consumers need the database content in addition to the notification, that can be accomplished via MySQL replication. This could also address the coupling question, as the consumer components could have read-only access to their replicated instances.
Using a messaging solution would also address any potential bottlenecks in the database-only solution, as it would separate the notification and storage into separate processes.
Depending on the language, you have a number of choices for the message distribution. If you're using Java, I'd actually recommend JGroups rather than JMS, as it's somewhat easier to configure.
If Java isn't your language of choice, Apache's Active MQ supports a number of languages for interfacing. Apache's Qpid is an AMQP implementation that also supports a number of languages (Java, C++, Python, Ruby, etc.)
Other messaging options could include XMPP, STOMP, or RestMS implementations.