Any drawback of building website based on JSON API for Data Access Layer - json

For instance, in ecommerce websites, we generally have two interfaces. One with which customer interacts and places orders and one with which company employees interact to manage orders and customers etc.
If we divide this website into two different websites. That means, two different projects all together, not dependent on each other. Only thing common between both websites will be the database. Both websites will be using the same database. Then what would be a good option for making Data Access Layer
Each website have its own Database access code and entities.
Link both website with a centralized layer - which exposes Read/Write to database using API based on JSON
In my opinion, second option would be better. As it cancels out dependency of database, any changes made in database need not to be made at two places. And many other benefits.
But my only concern is, how much it could hamper performance of overall system? Because in that case we are serializing and de-serializing objects and also making use of HTTP connections.
Could someone please throw some light over what would be benefits and drawbacks of API backed Data Access Layer in comparison to having own Database access code.

People disagree about the best architecture for this sort of thing, but one common and popular architectural guideline suggest that you avoid integrating two products at the database layer at all costs. It is simpler to have two separate apps and databases which can change independently of each other, and if you need to reference data from one in the other you should have some sort of event pipeline between the two configured on the esb.
And, you should probably have more than two back end databases anyway -- unless you have an incredibly simple system with only the two classes of objects you mentioned, you'll probably find that you have more than two bounded domains.
Also, if your performance requirements increase then you'll probably want to look at splitting the read and write sides of your services and databases, connecting the two sides through an eventing system of some sort, (maybe event-sourcing).
Before you decide what to do you should read Implementing Domain Driven Design by Vaughn Vernon. And, the paper on CQRS by Martin Fowler. And the paper on event sourcing, also from Dr Fowler. For extra points you should also read Fowler on Microservices architecture.
Finally, on JSON -- and I'm a big fan -- but you should only use it at the repository interface if you're either using javascript on the back end (which is a great idea if you're using io.js and Koa) and the front end (backbone & marionette, please), or if you're using a data-source that natively emits json. If you have to parse it then it's only going to slow you down so use some format native to the data-source and its consumers, that way you'll be as fast as possible.

An API centric approach makes more sense as the data is standardised and gives you more flexibility by being usable in any language for one or multiple interfaces.
Performance wise this would greatly depend on the quality and implementation of the technology stack behind the API. You could also look at caching certain data on the frontend to improve page load time.
The guys over at moltin have already built a platform like this and I've had great success using it. There's already a backend dashboard and the response times are pretty fast too!

Related

Ways to structure an application that has two clear parts

I am in a project that has an infinite amount of tables, We have to come to a solution that brings scalability to the platform, and we don't seem to figure out what would be a really good one.
The platform is a job seeker, so it has two clear parts, candidates, and companies.
We've been thinking and have come to those posible solutions to re-estructure the current database, as it is a monster.
2 API's 2 Databases: This way would take a lot of database migration work, but would define very clearly the different parts of the platform.
2 API's 1 Database: Doing this, the database work would be reduced to normalize what we have now, but we would still have the two parts of the platform logically separated.
1 API 1 Database: Normalize the database, and do everything in the same API, trying to logically separate everything, making it scalable but at the same time accesible from one part to the other.
Right now I am more into the 1 API 1 Database solution, but we would like to read some experienced users to make the final choice.
Thank you!
I was in a situation kind of like yours some years ago. I will try to express my thoughts on how we handled it. All this might sound opinionated but each and every task is different, therefore the implementations are as well.
The two largest problems I notice:
Having an infinite number of tables is the first sign that your current database schema design is a Big Ball of Mud.
Acknowledging that you have a monster database indicates that you better start refactoring it to smaller pieces. Yes I know it's never easy.
It would add a lot more value to your question if you would show us some of the architectural details/parts of your codebase, so we could give better suited ideas.
Please forgive me for linking Domain Driven Design related information sources. I know that DDD is not about any technological fluff, however the strategy you need to choose is super important and I think it brings value to this post.
Know your problem domain
Before you start taking your database apart you should clearly understand how your problem domain works. To put it simply: the problem domain definition in short is the domain of the business problems you are trying to solve with the strategy you are going to apply.
Pick your strategy
The most important thing here is: the business value your strategy brings. The proposed strategy in this case is to make clear distinctions between your database objects.
Be tactical!
We chose the strategy, now we need to to define tactics applied to this refactoring. Our definition of our tactics here should be clearly set like:
Separate the related database objects that belong together, this defines explicit boundaries.
Make sure the connections between the regrouped database objects remain intact and are working. I'm talking about cross table/object references here.
Let's get technical - the database
How to break things
I personally would split up your current schema to three individual separate parts:
Candidates
Companies
Common tables
Reasoning
By strategically splitting up these database objects you consciously separate these concerns. This separation lets you have a new thing: tactical boundary.
Each of your newly separated schemas now have different contexts, and different boundaries. For example there is the Candidates schemas bounded context. It groups together business concepts/rules/etc. The same applies to the Companies schema.
The only difference is the Common tables schema. This could serve as a shared kernel -a bridge, if you like- between your other databases, containing all the shared tables that every other schema needs to reach.
Outcome
All that has been said could bring you up to a level where you can:
Backup/restore faster and more conveniently
Scale database instances separately
Easily set/monitor the access of database objects defined per schema
The API
How to glue things
This is the point where it gets really greasy, however implementing an API is really dependent on your business use case. I personally would design two different public API's.
Example
For Candidates
For Companies
The same design principles apply here as well. The only difference here is that I think there is no added business value to add an API for the Common tables. It could be just a simple database schema which both of these main API's could query or send commands to.
In my humble opinion, seperating databases results in some content management difficulties. Both of these seperate parts will contain exactly same tables like job positions, cities, business areas etc. How will you maintain these tables? Will you insert country "Zimbabwe" to both of them? What if their primary keys not equal? At some point you will need to use data from these seperated databases and which record of "Zimbabwe" will be used? I'm not talking about performance but using same database for these two project will be make life easier for you. Also we are in cloud age and you can scale your single database service/server/droplet as you want. For clearity of modules, you can define your naming conventions. For example if table is used by both parts, add prefix "common_", if table only used by candidates use "candidate_" etc.
For API, you can use same methodology, too. Define 3 different API part. Common, candidates and companies. But in this way, you should code well-tested authentication and authorization layer for your API.
If I were you, I'd choose the 1 API, 1 Database.
If it fails, seperating 1 API to 2 API or 1 Database to 2 Database is much easier then merging them (humble opinion...)

Microservice Database shared with other services

Something I have searched for but cannot find a straight answer to is this:
For a given service, if there are two instances of that service deployed to two machines, do they share the same persistent store or do they have separate stores with some syncing mechanism (master/slave, clustering)?
E.g. I have a OrderService backed by MySQL. We're getting many orders in so I need to scale this service up, so we deploy a second OrderService. Where does its data come from?
It may sound silly but, to me, every discussion makes it seem like the service and database are a packaged unit that are deployed together. But few discussions mention what happens when you deploy a second service.
Posting this as an answer because it's too long for a comment.
Microservices are self contained components and as such are responsible for their own data. If you want the get to the data you have to talk to the service API. This applies mainly to different kinds of services (i.e. you don't share a database among services that offer different kinds of business functionality - that's bad practice because you couple services at the heap through the database and it's then easy to couple more things that would normally be done at the API level but it's more convenient to do them through the database => you risk loosing componentization).
But if you have the same kind of service then there are, as you mentioned, two obvious choices: share a database or have each service contain it's own database.
Now you have to ask yourself which solution do you chose:
Are these OrderServices of yours truly capable of working on their own, or do you need to have all the orders in the same database for reporting or access by other applications?
determine what is your actual bottleneck. Is it the database? If not then share the database. Is it the services? If not then distribute your data.
need to distribute the data? What are your choices, what are your needs? Do you need to be consistent all the time or eventual consistency is good enough? Do you need to have separate databases and synchronize them manually or does your database installation handle replication and partitioning out of the box?
etc
What I'm trying to say is that in this kind of situations the answer is: it depends. And something that we tech geeks often forget to do before embarking on such distributed/scalability/architecture journeys is to talk to business. Often business can handle a certain degree of inconsistencies, suboptimal processes or looking up data in more places instead of one (i.e. what you think is important might not necessarily be for business). So talk to them and see what they can tolerate. Might be cheaper to resolve something in an operational way than to invest a lot into trying to build a highly distributable system.

how much work should we do in the database?

how much work should we do in the database?
Ok I'm really confused as to exactly how much "work" should be done IN the database, and how much work had to be done instead at the application level?
I mean I'm not talking about obvious stuff like we should convert strings into SHA2 hashes at the application level instead of the database level..
But rather stuff that are more blur, including, but not limited to "should we retrieve the data for 4 column and do a uppercase/concatenation at the application level, or should we do those stuff at the database level and send the calculated result to the application level?
And if you could list any more other examples it would be great.
It really depends on what you need.
I like to do my business logic in the database, other people are religously against that.
You can use triggers and stored procedures/functions in SQL.
Links for MySQL:
http://dev.mysql.com/doc/refman/5.5/en/triggers.html
http://www.mysqltutorial.org/introduction-to-sql-stored-procedures.aspx
http://dev.mysql.com/doc/refman/5.5/en/stored-routines.html
My reasons for doing business logic in triggers and stored proces
Note that I'm not talking about bending the database structure towards the business logic, I'm talking about putting the business logic in triggers and stored procedures.
It centralizes your logic, the database is a central place, everything has to go through it. If you have multiple insert/update/delete points in your app (or you have multiple apps) you'll need to do the checks multiple times, if you do it in the database you only have to do the checks in one place.
It simplifies the application e.g., you can just add a member, the database will figure out if the member is already known and take the appopriate action.
It hides the internals of your database from the application, if you do all your logic in the application you will need intricate knowledge of your database in the application. If you use database code (triggers/procs) to hide that, you don't need to know every database detail in your app.
It makes it easier to restucture your database If you have the logic in your database, you can just change a tablelayout, replace the old table with a blackhole table, put a trigger on that and let the trigger do the updates to the new table, your app does not even need to know the database has changed, this allows legacy apps to keep working unchanged, whilst new apps can use the improved database layout.
Some things are easier in SQL
Some things work faster in SQL
I don't like to use (lots of and/or complicated) SQL code in my application, I like to put SQL code in a stored procedure/function and try to only put simple queries in my application code, that way I can just write code that explains what I mean in my application and let the database layer do the heavy lifting.
Some people disagree strongly with this, but this approach works well for me and has simplified debugging and maintenance of my applications a lot.
Generally, its a good practice to expect only "Data" from the Database. Its upto Application(s), to apply Business/Domain Logic and make sense of the data retrieved. Its highly recommended to do the following things in the Application Layer:
1) Formatting Date
2) Applying Math functions, such as interpolation/extrapolation, etc
3) Dynamic sorting (based on columns)
However, situations sometime warrant few things to be done at the database level.
In my opinion application should use data and database should provide them and that should be clear separation of concerns. So database gives records sorted, ordered and filtered according to requested conditions but it is up to application to apply some business logic to that records and "convert" them into something meaningful to the user.
For example, in my previous company we worked on big application for work time calculations. One of obvious functionalities in this kind of application is tracking vacation days of employees - how many days employee has per year, how many he used, how many left, etc. Basically we could write some triggers and procedures that would update those columns automatically. So when employee had his vacation days approved amount of days he applied for is taken from his "vacation pool" and added to "vacation days used". Pretty easy stuff but we decided to make it explicit on application level and boy, very soon we were happy we did it that way. Application had to be labor law compliant and it quickly turned out that not for all employees vacation days are calculated equally and sometimes vacation day can be not so vacation day at all but that is beside the point. Had we put this "easy" operation in database we had to version our database with every little change to a vacation days related logic and that would lead us straight to hell in customer support field due to a fact that it was possible to update only application without a need to update database (except clear "breakthrough" moments where database structure was changed of course).
In my experience I've found that many applications start with a straight-forward set of tables and then and handful of stored procedures to provide basic functionality. This works very well; it usually yields high performance and is simple to understand, it also mitigates any need for a complex middle-tier.
However, applications grow. It's not unusual to see large data-driven applications with thousands of stored procedures. Throw triggers into the mix and you have an application which, for anybody other than the original developers (if they're still working on it), is very difficult to maintain.
I will put a word in for applications which place most logic in the database - they can work well when you have some good database developers and/or you have a legacy schema which cannot be changed. The reason I say this is that ORMs take much of the pain out of this part of application development when you let them control the schema (if not, you often need to do a lot of fiddling to get it working).
If I was designing a new application then I would usually opt for a schema which is dictated by my application domain (the design of which will be in code). I would normally let an ORM handle the mapping between the objects and the database. I would treat stored procedures as exceptions to the rule when it came to data access (reporting can be much easier in sprocs than trying to coax an ORM into producing a complex output efficiently).
The most important thing to remember though, is that there are no "best practices" when it comes to design. It is up to you the developer to weigh up the pros and cons of each option in the context of your design.

Database responsibility

I'm starting with Databases. I've been playing around with MySQL and Informix, but never had a real life project.
What is the real responsibility of a Database? Should we add Store procedures and functions to de Database or just let it to be a data repository with no logic?
What is the real responsibility of a Database?
A database at its core is a system to store and retrieve data. A CSV file on disk + suitable tools (e.g. Excel) is a simple example of this. In addition, a database might provide additional capabilities, such as transaction control, data integrity, and security.
Should we add Store procedures and functions to de Database or just let it to be a data repository with no logic?
What do you want from the database? If all you want is a "bit bucket", then by all means, store it in a plain file on disk and call it "the database". If you want a bit more than that, use a product that suits your needs. If you want to be able to query it using a 4GL like SQL, use MySQL. If you want transaction control, security, advanced query features, etc etc, use another DBMS if appropriate. Whatever product you choose, however, take advantage of that product. Otherwise you're wasting your time and money. Sure, you'll never use all of the features (only a subset will be useful to you), but if you use very few of them, you may as well downgrade to a simpler product.
If you're using Oracle, you can store procedures and functions (even better, whole packages) right there in the database alongside the data. The real question is, what do you need to write in those procedures and functions - business logic or presentation logic?
Personally, I usually prefer to keep business logic close to the data, whereas presentation logic is custom-made for each interface.
It is possible to create an API layer over your data so that no matter how your applications access your database, they will get a consistent view of it, and they will all modify it using a consistent mechanism. In other words, instead of writing the business logic multiple times (once for each interface), you write it once and once only, then re-use it everywhere.
There are two reasons I've heard why business logic should not be stored in the database:
1. Maintainability: it's hard to change. I never really understood this one. How hard is it to type CREATE OR REPLACE PACKAGE? I suspect it's just the burden of having to learn "yet another language".
2. Database independence: what works in Oracle won't work elsewhere. This is a biggie, and better minds than I have written about this one. Basically, if you really need it to be "database agnostic", you won't be able to use any of the advanced features of the database you bought, so you may as well just use the simplest/cheapest one you can find; in which case, you don't need it to work on every database anyway!
Generally it's considered good practice to not place business logic in your database. The main reason is maintainability. It is ok to use stored procedures still, but including business logic within those stored procedures makes your application harder to debug and update.
Including business logic in your database will also effectively tie you to using that one DBMS, and not allow the data layer to remain independent from your application. For example, you may encounter performance and scalability problems with one DB once your application is live, but due to business logic scattered throughout the db, migrating to a more scalable database will be time consuming at best.
If business logic is kept in application code (eg java or c#) and the data layer is abstracted using a data abstraction layer, and an ORM if language permits, then interchanging databases is much less problematic.
We should be striving for separation of concerns, and keeping business logic out of the db helps achieve that.
edit: There are also performance concerns which may dictate that stored procedures are a good place to keep business logic. Containing logic within the data tier (ie the sproc) in some cases reduces the many round trips between the data abstraction layer and the database, which can give a performance boost. I've worked on systems like this in the past, for this reason, but I've always found then difficult to maintain. The problem being that you can look through the classes and procedures to see the business logic and think that's it and you will not see how a particular bug or process can be occurring, then you'll find the stored procedure and see the other half of the business operation (a real pain when the sproc is a 1000 lines!)
As with many things, where you place your business logic depends on the particular problem you're trying to solve.
We have a lot of data around us which can be of great use to us. Ordered collection of information helps businesses to take more proper decisions. Databases are ordered storage of information.
Responsibility: In a common scenario, we can state that there is a lot of information around, ordered collection of information is called data, this information relates to an entity, and ordered collection of data is a database, information relating to a group of entities. Collection of these databases is a DBMS. Responsibility of the database is organizing information.
Stored procedures, functions are more like the business processes that you require in order to collect the data you desire to.
First starting point,
Begin:
Select database in {postgreSQL, MySQL, SQL Server(Express edition)} and install it.
Learn about Codd Rules, Normal forms, Good resource
Start learning SQL, write queries.
Understand the basics involved in schema creation.
Learn procedural language implementation in database.
Ask doubts in SO.

Multiple Domains Site Design Decision

i am developing a project that its domain is meaningful in my native language. So i bought a second English domain for global usage.
My question is, how should i construct my site?
Two different projects or one project with localization support?
Two different databases or shared database?
What is my goal?
Dont want to show English content in native site, vice versa
I want to easily update site
If you suggest me to use shared database, could you please describe me design principle of database?
Thank You.
Typically for application code you ideally want to not fork for any reason including language. There are some quick things you need to watch out for;
Ensure that strings are not hardcoded
Store all datetimes in UTC
Ensure that all user profiles have an associated timezone (you can grab this from the user's browser
Try to ensure that your presentation is separate from your page content (i.e. use CSS, Master Pages, Templates or whatever your platform supports).
As for the database this depends more on the data your holding, for example if;
You want users to share logins across both sites
Knowledge to be shared but not necessarily localized (Wiki Entries)
The sites are managing a shared resource (i.e. a single warehouse)
You might want to have one database.
However if you find the following are true;
You don't want/need users between the sites to have cross over (think amazon.com and amazon.co.uk)
Knowledge is wholly separate with entries in one language being irrelevant to the other
The sites are managing wholly separate resources (i.e. two separate warehouses)
You might lean towards two separate databases. This will give you an advantage in scaling (though its not a silver bullet) and as long as the schemas are identical across the databases you will likely find that it's not too onerous.
One other option is to identify shared resources and split them into another repository (think user logins etc...). This can get you the best of both worlds but of course is a more complex design.
Remember all of this can be added after the fact it just becomes harder. Sometimes it's more important to get to market than it is to try and solve all your problems up front.
Good Luck!
I'm not quite sure what could work for you, but I think that localization support would it be nice, and if you have a shared database you won't need to support to different databases and you won't need to add an extra database anytime you need to add a new language, and thinking about the application it would be easier just to if you want another language just to add it to your configuration and not create another project just to add that.