Feedback on this back-end stack - json

I'm planning to setup an scalable architecture capable of providing web services on a REST interface where JSON will be sent as a result.
The web services will be quite simple for a CRUD web 2.0 app.
I think javascript (nodejs + mongodb) is a good choice for the following reasons:
Easy to find javascript developers
Good performance
Easy to scale
Shared logic/language or possible reuse of code between database query language, back-end and web-client.
There are testing and logging frameworks for node
By the examples I've seen node seems light in terms of the lines of code needed to implement web services.
Questions:
I think of scaling a node app which supplies a web service as having
a central node which will be routing/balancing charge to each of the
node instances. Which will also help doing seamless updates, is
there any piece of software already implemented which can fit that
task?
Please point all the disadavantages or other advantages you find in this back-end stack
If you feel this question opens too much of a debate and doesn't fit the stackoverflow policy please indicate a forum where I could get feedback.
Any other good persistence choices other than MongoDB? Mainly this choice comes from the javascript query language and JSON schemas.

Regarding your "router" piece:
Since your REST api will be composed of HTTP requests, it's common practice to use a high-speed proxy such as NGINX or HAProxy to distribute requests among the many servers who actually perform the work (in your case, NodeJS servers). This generally works well and allows easy scaling and failover.

Related

Is there a standard PubSub protocol over WebSocket?

I'm looking for a way to implement basic Publish / Subscribe between applications written in different languages, to exchange events with JSON payloads.
WebSocket seems like the obvious choice for the transport, but you need an (arguably small) layer on top to implement some of the plumbing:
aggreeing on messages representing the pubsub domain "subscribe to a topic", "publish a message"
aggreeing on messages for the infra ("heartbeat", "authentication")
I was expecting to find an obvious standard for this, but there does not seem to be any.
WAMP is often refered to, but in my (short) experience, the implementations of server / clients libraries are not great
STOMP is often refered to, but in my (even shorter) experience, it's even worse
Phoenix Channels are nice, but they're restricted to Phoenix/Elixir world, and not standard (so the messages can be changed at any phoenix version without notice.)
So, is everyone using MQTT/WS (which require another broker components, rather than simple servers ?) Or gRPC ?
Is everyone just re-implementing it from scratch ? (It's one of those things that seems easy enough to do oneselves, but I guess you just end up with an half-baked, poorly-specified, broken version of the thing I'm looking for...)
Or is there something fundamentally broken with the idea of serving streams of data from a server over WS ?
There are two primary classes of WebSocket libraries; those that implement the protocol and leave the rest to the developer, and those that build on top of the protocol with various additional features commonly required by realtime messaging applications, such as restoring lost connections, pub/sub, and channels, authentication, authorization, etc.
The latter variety often requires that their own libraries be used on the client-side, rather than just using the raw WebSocket API provided by the browser. As such, it becomes crucial to make sure you’re happy with how they work and what they’re offering. You may find yourself locked into your chosen solution’s way of doing things once it has been integrated into your architecture, and any issues with reliability, performance, and extensibility may come back to bite you.
ws, faye-websockets, socket.io, μWebSockets and SocketCluster are some good open-source options.
The number of concurrent connections a server can handle is rarely the bottleneck when it comes to server load. Most decent WebSocket servers can support thousands of concurrent connections, but what’s the workload required to process and respond to messages once the WebSocket server process has handled receipt of the actual data?
Typically there will be all kinds of potential concerns, such as reading and writing to and from a database, integration with a game server, allocation and management of resources for each client, and so forth.
As soon as one machine is unable to cope with the workload, you’ll need to start adding additional servers, which means now you’ll need to start thinking about load-balancing, synchronization of messages among clients connected to different servers, generalized access to client state irrespective of connection lifespan or the specific server that the client is connected to – the list goes on and on.
There’s a lot involved when implementing support for the WebSocket protocol, not just in terms of client and server implementation details, but also with respect to support for other transports to ensure robust support for different client environments, as well as broader concerns, such as authentication and authorization, guaranteed message delivery, reliable message ordering, historical message retention, and so forth. A data stream network such as Ably Realtime would be a good option to use in such cases if you'd rather avoid re-inventing the wheel.
There's a nice piece on WebSockets, Pub/Sub, and all issues related to scaling that I'd recommend reading.
Full disclosure: I'm a Developer Advocate for Ably but I hope this genuinely answers your question.

What is difference between json-rpc and json-api?

Could anybody explain advantage of using json-rpc over json-api and vice-versa? First and second format are JSON-based, but where I should use one, and where is another?
Note: I may come across a little biased. I am the author of the Json-RPC.net server library.
Json-RPC is a remote procedure call specification. There are multiple libraries you can use to communicate using that protocol. It is not REST based, and is transport agnostic. You can run it over HTTP as is very common, you can also use it over a socket, or any other transport you find appropriate. So it is quite flexible in that regard. You can also do server to client along with client to server requests with it by hosting the RPC server on either the client or the server.
Json-API is a specification for building REST APIs. There are multiple libraries you can use to get started with it. In contrast to Json-Rpc it requires you to host it on an HTTP server. You cannot invoke functions on the client with it. You cannot run it over a non-http transport protocol. Being REST based, it excels at providing information about Resources. If you want an API that is based around the idea of Create, Read, Update, Delete on some collections of resources, then this may be a good choice.
Json-API is going to be better if your API is resource-based, and you want your API to be browsable by a human without setting up documentation for it. Though that human would likely need to be in the software engineering field to make any sense of it.
Json-RPC is going to be better if your API is function based, or you want the flexibility it provides. Json-RPC can still be used to manipulate Resources by creating Create, Read, Update, and Delete functions for your resources, but you don't get the browsability with it not being REST based. It can still be explored (not browsed) by a human by generating documentation based off of the functions you expose.
A popular example of something that uses Json-Rpc is BitCoin.
There are a lot of popular REST-based API's and Json-API is a spec with a bunch of tools to help you do REST right.
--
Note: Neither of those (Json-RPC, or Json-API) are good when you consider for developer time, performance, or efficiently using network resources.
If you care about performance, efficiency, or developer time then take a look at Google's gRPC which is fantastic in those regards, and can still reduce developer time more than using a REST API as client and server code can be generated from a protocol definition file.

Connect Sproutcore App to MySQL Database

I'm trying to build my first Sproutcore App and I struggle to connect it to a MySQL-Database or any datasource other than fixture. I can't seem to find ANY tutorial except this one from 2009 which is marked as deprecated: http://wiki.sproutcore.com/w/page/12413058/Todos%2007-Hooking%20Up%20to%20the%20Backend .
Do people usually not connect SC-Apps to a Database? If they do so, how do they find out how to? Or does the above mentioned tutorial still work? A lot of gem-commands in the introduction seems to already differ from the official Sproutcore getting-started-guide.
SproutCore apps, as client-side "in-browser" apps, cannot connect directly to a MySQL or any other non-browser database. The application itself runs only within the user's browser (it's just HTML, CSS & JavaScript once built and deployed) and typically accesses any external data via XHR requests to an API or APIs. Therefore, you will need to create a service wrapper around your MySQL database in order for your client-side app to be able to load and update data.
There are two things worth mentioning. The first is that since the SproutCore app contains all of your user interface and a great deal of business logic, your API can be quite simple and should only return raw data (such as JSON). The second is that, I should mention that the client-server design, while more tedious to implement, is absolutely necessary in practice, because you can never trust the client side code, which is in the hands of a possibly nefarious user. Therefore, your API should also act as the final gatekeeper to validate all requests from the client.
This tutorial I found helped me a lot. Its very brief and demonstrates how to implement a very simple login-app, how to send post-requests (triggered by the login-button-action) to the backend-server and how to asynchronously process the response inside the Sproutcore-App:
http://hawkins.io/2011/04/sproutcore_login_tutorial/

JSON only between backend and frontend

I'm designing the architecture for a new web application.
I think that communications between the backend (server) and the frontend should be JSON only.
Here are my arguments:
Its the client responsibility to manipulate and present data in its own way. The server should just send to the client the raw information needed.
JSON is lightweight and my application might be used by remote clients over poor mobile connections
It allows multiple front-end developments (Desktop devices, mobile
devices) and has the potential to create an API for other developers
I can't see any counter-argument to this approach, considering that we have internally the frontend skills to do almost everything we need from raw JSON information.
Could you provide counter-arguments to this JSON-only choice so that I can make a more informed choice?
There must be some as a lot of backend frameworks (think about the php ones) still advertise HTML templating to send HTML formatted responses to the clients.
Thanks
UPDATE: Even though I researched the topic before, I found a similar and very interesting post: Separate REST JSON API server and client?
There are many front end based framework already in market which support a Json very efficiently,some of them are backbone,underscore,angular etc.Now if we talk about backend,we generally use REST based communication for such type of application.So i think this type of architecture already exits in market and working very well,specially if i talked about mobile based application.
Although this question is dead, I think I should try to weigh in.
For all the reasons you stated and more, communication between the back-end and front-end via only JSON files is maybe the best way available, as it provides a more compartmentalized structure for your web application, and at the same time drastically reduces the data sent over your users' connection.
However, some drawbacks that are a direct consequence of this are:
Need for a lot more JavaScript front-end development (as the HTML structure is not being sent by the server and needs to be created in the client)
It shifts the pressure from the server to the client, thus there is more JavaScript for the client to run (this can sometimes be a problem especially for mobile users)

In which domains are message oriented middleware like AMQP useful?

What problem do MOM (Message Oriented Middleware) solve? Scalability? Integration?
In which domain are they typically used and in which domains are they typically not used?
For example, say, is Google using such solution for it's main search engine or to power GMail?
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Does it make sense for desktop apps that need to communicate with a server?
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
I'm a bit confused as to what they're useful for and I think that with example of where they're appropriate and where they're not appropriate I could better understand their use.
This is a great question.
The main uses of messaging are: scaling, offloading work, integration, monitoring, event handling, routing, networking, push, mobility, buffering, queueing, task sharing, alerts, management, logging, batch, data delivery, pubsub, multicast, audit, scheduling, ... and more. Basically: anything where you need data but don't want to make a database request. (Caching is another, longer story).
Another way of looking at this is to notice that many applications used to be built by assuming that users (people) would perform actions that would be fulfilled by executing a transaction on a database (including reads, writes). But today, many actions are not user-initiated. Instead they are application-initiated. For example "tell me when the book that I want to buy is in stock". The best way to solve this class of problems is with messaging of some sort. Whether you call it middleware or web push or real time salad dressing does not matter. It's all messaging.
When you enable applications to initiate or react to events, then it is much easier to scale because your architecture can be based on loosely coupled components. It is also much easier to integrate those components if your messaging is based on a stable, scalable, serviceable tool, preferably using open standard APIs and protocols.
I hope this helps. We try to maintain a list of useful links about messaging here
Please get in touch with questions and comments on any of this, we are dead easy to find.
To address your specific questions:
In which domain are they typically used and in which domains are they typically not used?
Like databases, messaging systems crop up everywhere.
For example, say, is Google using such solution for it's main search engine or to power GMail?
Google uses a lot of home grown technology, but a lot of their open source contributions and known use cases suggest that messaging is (or should be) central to some of the main services.
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Very much so.
An example use case is scaling web page requests. When the user makes a web request, the web server puts it onto a queue for background processing. This means that the web server can keep working while the request is processed. It also means that the web server does not need to know how the request is handled, making system maintenance, upgrade and rollback much simpler because the main parts are 'decoupled'.
So, anyway, the web request gets processed by a back end service, or possibly by many services, eg 'look up book titles', 'draw shopping cart', 'get advertisement', 'check user account'... Finally all the results get put onto another queue, ready for collection and user response by the web server. Typically the system will include a timeout of around 100ms so that any late requests just get thrown away. The user sees anything that got processed in the time interval. This is one reason why some large ecommerce sites have pages that appear to load in stages.
There are many more use cases...
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Definitely. If you have an unknown, or unbounded, number of users, server side instances, and application latencies, then it makes sense to use messaging, even if just as a scalable substrate for non-blocking RPC.
Does it make sense for desktop apps that need to communicate with a server?
In lots of cases. One very common case is when the server pushes events to the desktop app, eg game event, tweets, price feeds in finance, system alerts....
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
Definitely not only for those 'legacy integration' cases but they are important too. At RabbitMQ, the biggest customers we have in terms of pure scale or message volume are cloud providers and big web application providers.
I will answer only one answer, from prior experience - take a look at this middle-ware that is employed by big companies here - middle-ware has one purpose - to glue dis-connected systems (written in disparate languages) together so that they can interact with one another and streamline the business process - Entera as I have had experience with, creates a middle layer in which the unix box using processes written in C, interact with the mainframe system (DB2, COBOL) via a front-end written in PowerBuilder (I am not naming the company!).
From the description I have given, Entera is a middle-ware which hosts a number of things - smooth integration of the flow of data regardless of the endian format, ability for different languages to talk to the middle-ware broker (a broker is a CORBA or DCE like process, that conforms to 'The Open Group) that listens on a particular port) and is specified by an IDL which makes a process appear to be local - if you understand the terminology used in Remoting under Microsoft's .NET Framework, you are not far off the mark! The middle-ware generates stubs which are linked at compile-time and manages the creation of the process, hosting it off a port, multi-threading at run-time, and also, the modern front-ends (such as .NET, Java, PowerBuilder even the unspeakable VB6...ok...VB.NET for the purists out there) can interact by opening a connection to the specified port on a particular IP address, and using the stubs generated, can interact with it directly.
Obviously, from what was described you can see how the legacy systems can have new life breathed into it and thus scalability of the process, the major downside of this is the cost factor which can run into thousdands of dollars. Big companies who uses mainframes as their back-end processing systems for billing/invoicing, who generate a huge revenue can obviously afford such an expensive product - to them it would seem like throwing pennies into a pool of water...because of the use of middle-ware which prolongs the business process, and breathe new life into it, can extend the business by a good number of years into the future without worrying about 'legacy' tag attached to it.
Incidentally, I carried this out as part of my thesis for my BSc. in Information Systems which covered this commercial front-end. There was an open source version of the middle-ware available on sourceforge called FreeDCE, but development efforts have declined or stopped.
Edit:
#cocotwo: That is exactly what middle-ware does as you said it is a plumbing tool...message oriented middle-ware is not really heard of AFAIK because I would imagine, the processes (functions) would need to be called as if they are locally visible within the application domain of the front-end to make it easy to interact with.
Using messages may have its advantages over RPC calls in that the messages are queued in a safe-holding area in the event that a network disconnection occurs - there may be some data caching going on within that aspect to allow the front-end to continue regardless...it would be useful in the instances of 'updating a status of a particular billing/invoice number' - a one-way write-data to the back-end via the middle-ware.
Ok, big companies would have advanced systems infrastructure in that technicians are constantly around the clock to ensure a smooth delivery of data-flow so that would have to be factored in. The company that I worked with had IBM Global Support contract to fulfill in order to ensure a maximum uptime 99% with 6 nine's after the decimal point...with hot-swapping/balanced-clusters/mirroring systems in place...
Whereas with RPC, if the disconnection occurs, the front-end would have to be restarted or would have to handle the disconnection event. It really depends if the message-queueing middle-ware handles each message in real-time and pass back results to the front-end immediately...
This is where each (Message-queueing and RPC related middle-ware) have their strengths and weaknesses...and also the cost mitigation factor such as support, maximum up-time, development efforts and training - that's a biggie here as middle-ware are really proprietary (despite following the 'The Open Group' layout/standards) and complex to setup and to glue the whole thing together via scripts.
Good answers and discussion here. Our consulting team has two preferred "messaging" solutions: RabittMQ and NXTera a high speed RPC middleware, the contemporary version of Entera mentioned above. My partners and I have developed several solutions using RabittMQ, it is the best tool available in that space right now. Additionally, I happen to work for the company that makes NXTera/Entera.
From experience I can clearly say that both of these products meet the need for reliability and low maintenance as discussed above. There are situations where a messaging service, like RabittMQ, is the right choice -- where Publish and subscribe, certified delivery, Queuing or store-and-forward are required.
In other cases, RPC's (remote procedure calls) are the best and fastest solutions for transactional and distributed processing for enterprise or cloud-based applications. When it is right to use an RPC, but SOAP/.NET (yes these are RPC implementations) are too slow, expensive or complex, a lightwieght high speed RPC middleware like NXTera/Entera is the right choice for us.
There is some use case overlap between RPC middleware and message oriented middleware, and where there are you can use either successfully. But both are strong and dependable choices.
The large companies I work with use both RPC and MoM side-by-side. As far as Internet companies, Google (Protocol Buffers) and Facebook (Thrift) show that RPC's have a roll to play in modern web and cloud-based development.