How do you build and deploy a scalable web services infrastructure? - language-agnostic

I have a client asking this for a requirement and haven't done this before, what does he mean by web service infrastructure?

That phrase encompasses a wide variety of technical aspects. Your infrastructure is all of the components that make up the systems that run a web business or application, including hardware. So it refers to your server and network setup, your bandwidth and connections in and out, your database setup, backup solutions, web server software, code deployment methods, and anything else used to successfully run a web business with high reliability and uptime and low error and bug incidents.
In order to make such a thing scalable, you have to architect all these components together into something that will work smoothly with growth over time. A scalable architecture should be flexible enough to handle sudden traffic spikes.
Methods used to facilitate scalability include replicated databases, clustered web servers, load balancers, RAID disk striping, and network switching. Your code has to take much of this into account.
It's a tough service to provide.

First thing that comes to mind was the Enterprise service bus.
He probably means some sort of "infrastructure" to run a lot of complex interacting web services.

Either an enterprise application that you call via a web service that can run on many instances of a web application server, or run a single instance that are very nicely multithreaded and scale to many CPUs, or deploying loads of different webservices that all talk to each other, often via message queues, until you have something that breaks all the time and requires a huge team of people to maintain. Might as well throw in a load of virtual machines to have a virtualised, scalable, re-deployable web service infrastructure (i.e., loads of tomcats or jbosses in linux VMs ready to deply as needed, one app per VM).
Then there is physical scalability. Is there enough CPU power for your needs? Is there enough bandwidth between physical nodes to send all these messages and SOAP transactions between machines? Is there enough storage? Is the storage available on a fast, low latency interconnect? Is the database nicely fed with CPU power, bandwidth, a disc system that doesn't lag. Is there a database backup. How about when a single machine can't handle the load of a particular function - then you need load balancers, although these are good for redundancy and software updates whilst remaining live as well.
Is there a site backup? Or are you scaling globally - will there be multiple data centres around the globe? Do you have redundant links to the internet from each data centre? What happens when a site goes down? How is data replicated between sites, to reduce inter-site communications, and how do these data caches and pushes work?
And so on and so forth. But your client probably just wants a web service that can be load balanced without thrashing (i.e., two or more instances can share data/sessions/etc, depends on the application really), with easy database configuration and backup. Ease of deployment is desirable, so make the install simple. Or even provide a Linux VM for them to add to their VM infrastructure. Talk to their sysadmin to see what they currently do.

This phrase is often used as a marketing term from companies who sell some part of what they'll call a "scalable web services infrastructure".
Try to find out from the client exactly what they need. Do they have existing web services? Do they have existing business logic they've decided to expose as web services? Do they have customers who are asking to be able to access your client's systems through web services?
Does your client even know what a web service is?

Related

Cloud based web service for Web Applications

My web application uses PHP/MySQL on the server side to fetch and store data in a database. The DB size will increase with the user base, and can be huge. The application has been built and run on a conventional server, i.e no "cloud" specific code has been written (I have no experience with cloud systems; Is running services on them any different from running on a normal server?)
My concerns:
1. If I buy space on Amazon Elastic Compute Cloud, can I directly port all my code to the new server, or do I have to use some APIs specific to that? Since it's pay as you go, it's highly suitable for such a requirement.
2. What are the other options for hosting a web service that would require large server space? How might apps like Whatsapp be doing the same?
Thanks.
1) The answer to the first question depends on the type of service you're buying. Cloud comes in many forms, from Infrastructure as a Service (which basically offers you hardware as a service on which you can run your software stack) to Software as a Service (e.g. Gmail, which lets you use applications (or APIs) hosted in the cloud ).
The best alternative, in your case I think it is Platform as a Service (e.g Heroku) which defines a set of technologies supported by the provider and how to use them.
Either case, how difficult it is depends on your app and the specification of the service and the level of support offered, so you have to dig a little deeper (starting with guides of how to deploy a similar app would be a good choice).
2) Startups and other medium size companies use cloud providers such as Amazon, Rackspace etc and when they reach a certain size tend to build their own data centers (e.g Zynga). There's a threshold beyond which is better to manage your own infrastructure instead of buying services from others.

Hosted Database v Cloud Database

I have looked everywhere...
whats the difference between a hosted database and a cloud database they seem like the same things?
Thanks
Both "hosted database" and "cloud database" mean that the database lives on the servers of some external provider/hoster.
The hoster might even be the same in both cases.
The main difference is that the "cloud" plans are usually meant to scale more (at a higher monthly fee), so you'd use them when you expect your site to get huge soon and need to quickly adjust server capacity when needed.
On the other hand, "hosted" plans are not that expensive, but have more limitations (server speed, database size...) and are more suited for "small" websites.
Of course this isn't by any means an "official" description of the two terms, but that's the impression that I get every time I see "cloud" or "hosted" webspaces/databases/services/whatever.
It depends on the context in which they're being used, but, yes, they usually mean the same thing. When I see the term cloud database being used they are usually referencing some cloud platform like Amazon EC2 or Microsoft Azure instead of GoDaddy or HostGator or something. Plus, cloud is the new buzz word - I'm sure it sells better. Lol.
As Christian Specht said, the cloud servers really scale more. So why you need more scaling? and why there are many featured options in cloud database service selection?
Things are not like before. Before smartphones and earlier pc operating systems, users gets information from the server only when they log on the specific web page using their credentials. But now apps like facebook shows notifications, provide ads etc and collect/push other data in parallel while we are looking at something else irrelevant.
Hosted database are reliable to access the database when users log onto the web page. But in case of the lastest smart phone applications, it needs to access the database everytime starting from its birth (installation on the device). So for each installation, the minimum workload over the server is expected to raise up.
So more scalability is required here. More simultaneous connections, Input/Output operation requests are expected daily. So with the dedicated servers with the core purpose, and with the configurable package selection based on your expectation of user count and bandwidth usage, Cloud Service is not yet another marketing term, but is a helpful service.

economical way of scaling a php+mysql website

My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.

Cloud Database Service Latency/Performance

I am running a heavy traffic site and our server is beginning to get to its limits, at the moment the entire LAMP stack is on one box (not ideal).
I would like to move the database onto it's own box or onto a cloud service, but from my previous experience moving the database off the same box as the webserver increases the latency of reads quite dramatically slowing down the site.
Is using a cloud service for this going to overcome this problem, because as far as I can tell its essentially the same situation (as moving it onto a separate box in my control)? In which case why is there so much popularity around cloud based database services at the moment?
Are cloud based database services so quick that the latency of reads is so low that its almost like having it on the same box in the same datacentre?
Using a cloud service just for your database won't help your situation.
If you only move the database, you're physically placing it in a remote location - which will always increase latencies, no matter how powerful the hardware serving the content.
I would suggest that you will see a benefit in hosting your database on a separate machines from your web server, so long as they are physically next to each other sharing a dedicated network (as already suggested).
If you wanted to explore the benefits of cloud services, I would suggest only doing so if you can move both database and web server together. Furthermore, it's really only of benefit if you explore load balancing across multiple web-servers and/or replicated databases. (The ability to scale dynamically is a major benefit of cloud based platforms).
Clouds are about paying someone else to manage the infrastructure so you don't have to. They also come with some nice benefits about being able to rapidly acquire infrastructure since you don't have to wait for physical machines to be landed you can simply tap into the "cloud's" unused capacity. Sure people build features on top of this infrastructure to make it easier to scale (this is usually programming against a certain model).
If you are thinking about a cloud when are you planning on moving to 10 servers...or 100? Do you deal with traffic that comes in large bursts where the peaks in your traffic are very high?
Since you are talking about moving to a second box I don't think you need to have the cloud discussion yet. Just add a database server and use caching like e4c5 recommended.
There will be increased latency going across the network, but it shouldn't be that noticeable. Gigabit ethernet is pretty fast. When you have tried splitting the boxes, how did you access the other box? You should be using a local, internal IP address (i.e. 192.168.#.#). If you are not, then your requests may get routed over the internet, even though the boxes are physically next to each other.
Moving to a cloud won't solve your problems if the servers aren't networked properly.

Scaling up from 1 Web Server + 1 DB Server

We are Web 2.0 company that built a hosted Content Management solution from the ground up using LAMP. In short, people log into our backend to manage their website content and then use our API to extract that content. This API gets plugged into templates that can be hosted anywhere on the interwebs.
Scaling for us has progressed as follows:
Shared hosting (1and1)
Dedicated single server hosting (Rackspace)
1 Web Server, 1 DB Server (Rackspace)
1 Backend Web Server, 1 API Web Server, 1 DB Server
Memcache, caching, caching, caching.
The question is, what's next for us? Every time one of our sites are dugg or mentioned in a popular website, our API server gets crushed with too many connections. Or every time our DB server gets overrun with queries, our Web server requests back up.
This is obviously the 'next problem' for any company like ours and I was wondering if you could point me in some directions.
I am currently attracted to the virtualization solutions (like EC2) but need some pointers on what to consider.
What/where/how to scale is dependent on what your issues are. Since you've been hit a few times, and you know it's the API server, you need to identify what's actually causing the issue.
Is it DB lookup times?
A volume of requests that the web server just can't handle even though they're shortlived?
API requests take too long to process? (independent of DB lookups, e.g., does the code take a bit to run)?
Once you identify WHAT the problem is, you should have a pretty clear picture of what you need to do. If it's just volume of requests, and it's the API server, you just need more web servers (and code changes to allow horizontal scaling) or a beefier web server. If it's API requests taking too long, you're looking at code optimizations. There's never a 1-shot fix when it comes to scalability.
The most common scaling issues have to do with slow (2-3 seconds) execution of the actual code for each request, which in turn leads to more web servers, which leads to more database interactions (for cross-server sessions, etc.) which leads to database performance issues. High performance, server independent code with memcache (I actually prefer a wrapper around memcache so the application doesn't know/care where it gets the data from, just that it gets it and the translation layer handles DB/memcache lookups as well as populating memcache).
Depends really if your bottleneck is reads or writes. Scaling writes is much harder than reads.
It also depends on how much data you have in the database.
If your database is small, but cannot cope with the read load, you can deploy enough ram that it fits in ram. If it still cannot cope, you can add read-replicas, possibly on the same box as your web servers, this will give you good read-scalability - the number of slaves from one MySQL master is quite high and will depend chiefly on the write workload.
If you need to scale writes, that's a totally different game. To do that you'll need to split your data out, either horizontally (partitioning / sharding) or vertically (functional partitioning etc) so that you can spread the workload over several write servers which do not need to do each others' work.
I'm not sure what EC2 can do for you, it essentially offers slow, high latency machines with nonpersistent discs and low IO performance on the end of a more-or-less nonexistent SLA. I guess it might be useful in your case as you can provision them relatively quickly - provided you're just using them as read-replicas and you don't have too much data (remember they have nonpersistent discs and sucky IO)
What is the level of scaling you are looking for? Is it a stop-gap solution e.g. scale vertically? If it is a more strategic scaling project, does your current architecture support scaling horizontally?