The situation is that about 50.000 electronic devices are going to connect to a webservice created in node.js once per minute. Each one is going to send a POST request containg some JSON data.
All this data should be secured.
The web service is going to receive those requests, saving the data to a database.
Also reading requests are possible to get some data from the DB.
I think to build up a system based on the following infrastructure:
Node.js + memcached + (mysql cluster OR Couchbase)
So, what memory requirements do I need to assign to my web server to be able to handle all this connections? Suppose that in the pessimistic possibility I would have 50.000 concurrent requests.
And what if I use SSL to secure the connections? Do I add too much overhead per connection?
Should I scale the system to handle them?
What do you suggest me?
Many thanks in advance!
Of course, it is impossible to provide any valuable calculations, since it is always very specific. I would recommend you just to develop scalable and expandable system architecture from the very beginning. And use JMeter https://jmeter.apache.org/ for load testing. Then you will be able to scale from 1000s to unlimited connections.
Here is a 1 000 000 connections article http://www.slideshare.net/sh1mmer/a-million-connections-and-beyond-nodejs-at-scale
Remember that your nodejs application will be single threaded. Meaning your performance will degrade horribly when you increase the number of concurrent requests.
What you can do to increase your performance is create a node process for each core that you have on your machine all of them behind a proxy (say nginx), and you can also use multiple machines for your app.
If you make requests only to memcache then your api won't degrade. But once you start querying mysql it will start throttling your other requests.
Edit:
As suggested in the comments you could also use clusters to fork worker processes and let them compete amongst each other for incoming requests. (Workers will run on a separate thread, thereby allowing you to use all cores).
Node.js on multi-core machines
Related
I have a bit theoretical question.
I have a microservice which handles 200/300 req/sec on peak. In each request we make several calls to DB (MySQL) and return some information.
Given the RPS, does it make sense to cache the data on app level?
Or modern MySQL servers can easily withstand such load and use their own Db-level cache instead?
Thanks
So I have developped this website with Symfony3 and Doctrine. I have one major concern about performance with MySQL and more specifically the number of simultaneous open connexions.
For the moment, one to five users are online on the website. What happens if, let's say, 1,500 users connect within one minute? Does Symfony3 or Doctrine handle this kind of situations? How can I be sure the website doesn't go down providing me with the Too many connections MySQL error?
And if I go up to 5,000? And 10,000? The server has 4GB of RAM and a 2.40Ghz mono-core processor but I wouldn't worry about the hardware as I'm more concerned about MySQL.
These situations already happened in the past but I was running the website with Wordpress and W3 Total Cache plugin. Should I consider using a cache manager such as memcached or else?
In short, I'm concerned about the website becoming unavailable in case of sudden high trafic (and thought of the MysQL Too many connections error in first but I might be missing something even more important).
Thanks for lightening me out on this one as I'm not fully aware about performance issues with Symfony.
I believe it does open one connection per visitor. Regardless of whether it does or not however neither Symfony or Doctrine has a magic bullet to handle every load/connection scenario.
Why don't you use a load testing tool (there are many) and see how it actually pans out? In my experience predicting a bottleneck is useless, as they will always crop up where you least expect it.
For example, the MySQL connection limit is only one part of the optimisation puzzle. It's no good just worrying about connection limits, you need to respond to web requests as quickly and efficiently as possible to free up MySQL connection resources (and other resources your app is using). So if your server is slow you will run out of connections (or some other resource) almost immediately under significant load, regardless of MySQL connection limits.
That said, those server specifications seem a little low for 5-10k users per minute. I wouldn't expect a machine like that to handle that kind of load without some serious optimisation/caching/etc.
The symfony performance page is a good starter, and there is also a good article on caching - there's a ton of available material on the subject. Good luck! :)
If you use php-fpm it depends on pm.max_children in fpm/pool.d/www.conf.
pm.max_children refers to the maximum number of concurrent PHP-FPM processes allowed to exist in such a pool. If the volume of incoming requests requires the creation of more PHP-FPM processes than the number allowed by the max_children limit, those additional requests are backlogged in a queue to await service.
So when pm.max_children > max_connections (my.cnf) and active users > max_connections you will get "Too many connections".
I am developing a web application using web-sockets which needs real time data.
The number of clients using the web application will be over 100 000.
Server side web socket coding is done in Java. Can a single web-socket server handle this amount of connections?
If not, how can I achieve this. I have to use web sockets only.
WebSocket servers, like any other TCP-based server, can open huge numbers of connections. They can be file-descriptor-based. You can find out the max (system-wide) FDs easily enough on Linux:
% cat /proc/sys/fs/file-max
165038
There are system-wide and there are kernel parameters for user limits (and shell-level things like "ulimit"). Btw, you'll need to edit /etc/sysctl.conf to increase your FD mods during a reboot.
And of course you can increase this number to whatever you want (with the proportional impact on kernel memory).
Or servers can do tricks to multiplex a single connection.
But the real question is, what is the profile of the data that will flow over the connection? Will you have 100K users getting 1 64-byte message per day? Or are those 100K users getting 50 1K messages a second? Can the WebSocket server shard its connections over multiple NICs (ie, spread the I/O load)? Are the messages all encrypted and therefore need a lot of CPU? How easily can you cluster your WebSocket server so failover is easy for you and painless for your users? Is your server mission/business critical?... that is, can you afford to have 100K users disappear if a disaster occurs? There are many questions to consider when you thinking about scalability of a WebSocket server.
In our labs, we can create millions of connections on a server (and many more in a cluster). In the real-world, there are other 'scale' factors to consider in a production deployment besides file descriptors. Hope this helps.
Full disclosure: I work for Kaazing, a WS vendor.
As FrankG explained above, the number of WebSocket connections is depended on the use case.
Here are two benchmarks using MigratoryData WebSocket Server for two very different use cases that also detail system configuration (let's note however that system configuration is only a detail and the high scalability is achieved by the architecture of the MigratoryData which has been designed for real-time websites with millions of users).
In one use case MigratoryData scaled up to 10 million concurrent connections (while delivering ~1 Gbps messaging):
https://mrotaru.wordpress.com/2016/01/20/migratorydata-makes-its-c10m-scalability-record-more-robust-with-zing-jvm-achieve-near-1-gbps-messaging-to-10-million-concurrent-users-with-only-15-milliseconds-consistent-latency/
In another use case MigratoryData scaled up to 192,000 (while delivering ~9 Gbps):
https://mrotaru.wordpress.com/2013/03/27/migratorydata-demonstrates-record-breaking-8x-higher-websocket-scalability-than-competition/
These numbers are achieved on a single instance of MigratoryData WebSocket Server. MigratoryData can be clustered so you can also scale horizontally to any number of subscribers in an effective way.
Full disclosure: I work for MigratoryData.
My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.
I have a requirement to build a distributed Comet-based server for a large number of clients (over 500K concurrent) with high throughput. I'm currently investigating the possibility of using Tornado for it's high efficiency in dealing with high number of long-polling requests.
My concern is whether a single Tornado server could handle such a large number of long polling clients. As an experiment, I would like to expand Tornado Chat demo (https://github.com/facebook/tornado/tree/master/demos/chat) to a distributed environment. I.e. have a bunch of Tornado chat servers running in parallel, each responsible for a changing set of clients.
I would appreciate any ideas/thoughts you have with regard to implementing such a scheme, or any references to relevant resources.
Thanks!
In general to make the basic chat distributed across several Tornado instances you need to create a distributed message passing mechanism, the most straightforward implementation will be to just use some kind of message queue like RabbitMQ (or it's competitor) and send fanout messages when user types something, while all connections are listening.
My initial thought about this is to have an Nginx server/reverse proxy in the front-end, while have multiple instances of Tornado in the back, this could be a Tornado instance per process, try to do some bench-marking to your machine to see how many running Tornado instances on different process a machine can handle, when you notice degradation in performance, start doing the same thing on another machine.
Nginx will round robin all the servers you have to distribute the load over the long-polling/Tornado servers/instances.
Not really sure how the rabbitmq will be useful in this case.