why is CF able to scale so quickly? - paas

Hi I am a new learner here, and going through the docs in cloud foundry and not able to find much like how Cloud Foundry is able to scale so quickly?
What is there in back which makes it so fast and easy to scale?

I have worked with Pivotal Cloud Foundry and will try to explain concepts with it.
Here is the link to the Diego Architecture.
Please look closely to the architecture diagram.
The diagram depicts the components within PCF and how they interact.
Cloud Foundry is an ecosystem containing a lot of components. The cells in the diagram are the Diego Cells. These are the actual vm's where containers are hosted and run.
At basic level, containers are in-fact folders on a host VM, with runtime isolations. A container does not know anything about another container.
When you push an app to PCF, the first thing that happens is the app is staged. Here is an article explaining How Diego Stages Buildpack Applications.
Notice the Blobstore. As part of the staging process, the cloud controller uploads a ready-to-go blob to the Blobstore. This blob contains, the OS, monitoring tools (both sourced from stem cell), the runtime (jvm, api tools etc from buildpack), and your application archive.
Cloud Foundry runs one and only one application in a container. That is very important. If the app dies, the container is reclaimed. A new container will spun up in its place.
A spinning up a brand new VM is expensive in terms of time and resources. Spinning up a new container on an existing VM is relatively very cheap. And, PCF has a ready-to-go blob available.
So, if there is a need to scale up or if an app instance crashes, PCF will be able to spin a new instance.
There are a ton of things involved in this process. The articles will walk you through it.
Hope this helps.

Related

Migrating cordova app from Google Maps to OSM

I need some directions. My mobile app is heavily dependent on Google Maps for the the following components: maps, geocoding/geolocation, autocomplete, distanceMatrix and directionsMatrix. However, I am running into quota issues and they are getting worse and worse. I need an as reliable solution but without the quotas and am considering OSM. I have my own api unix server I can install OSM too but I am having a hard time finding a complete install package.
Everything I read is separate packages to install and configure for each of the components I need to migrate to. Is there not a comprehensive OSM package that has all of the components built into it? I keep thinking there should be a single package that is all integrated to work together but I can't find one.
If one exists please point me in the right direction. If one doesn't can you please recommend the best, and easiest to use, OSM packages that meet my listed of required components? Thanks in advance.
OSM consists of several components. First there is the map data which can be seen as raw data, no software involved. Depending on your goals you will likely need a database software, which is usually postgresql. For drawing a map you will need a renderer, for address searching a geocoder, for directions a routing software and so on.
Showing a map: Use one of the available tile providers or render your own tiles
Geocoding: Take a look at Nominatim or Photon. Photon is based on Nomatim and adds some features, most importantly autocompletion. There are other search engines available as well.
Routing: GraphHopper or OSRM. More alternatives available, check the list of OSM online routers.
Similarly to tile servers, most of these softwares can be either run by yourself or accessed via various online providers. Online providers usually have quotas whereas running your own software is solely limited by your own resources.

Openshift scaling on specific (software) condition

I'm looking for a scaling mechanism on OpenStack cloud, and then I found OpenShift. My scenario is something like this: we have a distributed system with many agents stand on many nodes. One node contain a Message Broker that direct the traffic. We want to monitor the Message Broker node, if a queue is full, we scale out the agent nodes handle that queue. In brief, we monitor one node to scale other nodes.
We used OpenStack cloud now. In OpenStack, I found heat and ceilometer which are able to create alarm and scale out nodes. However, alarms are based only on general info like CPU, RAM, Network usage, etc (not inside-VM info).
Then I search for a layer above: PaaS. I found OpenShift can handle scaling apps. But as I knew, the scaling mechanism of OpenShift is: duplicate the apps based on network traffic, then put an HAProxy in front.
Am I right that OpenShift can't monitor software specific data. Is there any other tool that suit our scenario?
You can try using this script (https://github.com/openshift/origin-server/blob/master/cartridges/openshift-origin-cartridge-haproxy/usr/bin/haproxy_ctld.rb) to control how your gears are scaled, but I believe that it is still experimental. Make sure that you read through all of the comments and understand what you are doing before making any changes. You might also consider spinning up a second scaled application to test this on before messing with your production application.

Additional Tutorials or worked examples of best practice for configuring multi vm projects in google compute engine

I was hoping people would know of more samples and best practice guides for configuring systems on google compute engine so I can gain more experience in deploying them and apply the knowledge to my own projects.
I had a look at https://developers.google.com/compute/docs/samples-and-videos#samples which runs through deploying cassendra cluster and hadoop using scripts but I was hoping there might be more available including on the following topics
Load balancing webservers across zones samples including configuring networking,
firewalls and load balancer.
Fronting tomcat servers with apache behind a load balancer
Multi network systems in compute engine using subnetting
Multi project systems and how to structure them for reliability and secure interoperability.
They would be easy to follow projects you build starting from a blank project and end up with a sample site running across multiple vm's & zones with recommended security in place, a bit like the videos you see for gae coding examples that go from hello world to something more complex but for infrastructure not code.
Does anyone know of any?
You may want to checkout https://cloud.google.com/developers/#resources for tutorial and samples as well as http://googlecloudplatform.github.io
I'm new to the forums so I can only post two links. Taking a quick look I see several topics that may be of interest to you:
Managing Hadoop Clusters on Compute Engine
Auto Scaling on the Google Cloud Platform
Apache Hadoop, Hive, and Pig on Google Compute Engine
Compute Engine Load Balancing in Action
I hope this helps!

Justification for using the cloud?

I'm currently writing a large scale ASP.Net web app.
One of the thngs I can't find out about is how to justify when to use the cloud. E.g. when should I use google app engine/azure?
Also, when would I want to use bigtable over a standard dbms such as Sql Server?
Thanks
Cloud computing is all about scalability. It allows you to scale up AND scale down without having to rework your designs.
It works well for small sites, since you are only paying for resources used, but if you need to scale up, it just happens automatically (provided your application was designed for the cloud).
Also, there are theoretically much better tools in place for maintaining uptime and reliability in the cloud. For example, a system upgrade can happen without stopping your service, since the cloud computing platforms can automatically take on or off servers to service your application.
There's been a lot of talk about that from the Azure devs.
Also, there can be a financial motivation for using the cloud. Using a hosted cloud architecture can be less expensive than managing the multiple servers (DB, web, etc) that would be required for a traditional site, at least up front. As your usage goes up, the cost follows, but in theory, it can be more cost effective.
I'm not too familiar with anything else except app engine and EC2.
I'll try to add something to the previous answers:
The best thing about app engine is it's free until you attract a certain amount of users and you are charged for what your application uses, idle time is not charged.
Big table may differ from an rdbms architecturaly but from a perspective of a developer using it it's not that different.
Another good thing is python is supported. The bad thing is the standard library is crippled.
Also, you don't have full control over your data on the cloud (appengine), what I mean is you can't completely restrict the people from google from taking a peek in what you store there.
This question is very closely related to another question asked today:
"When shouldnt-you-use-a-relational-database?"
Relational databases and non-relational databases (like BigTable) address different needs. Not only in scale and performance, but in the structure and usage of the data.
The "Cloud" as I understand it is about scalability primarily. That is, the architecture refers to a capability to increase capacity in a scalable way.
Also, the Cloud is used frequently in reference to the Software-as-a-Service (SaaS) model, where someone else takes care of the servers, but that's an independent issue from the Cloud architecture. I.e. you could operate your own set of servers in a Cloud architecture.
So the justification for using the Cloud architecture is that you have an application that has a variable need for computing capacity. So it would be overkill to have N servers dedicated to match your peak level of activity. The Cloud allows you to vary your usage of the servers as your level of activity grows (and diminishes) over time.
The justification for using a SaaS model is that you don't want to be in the business of operating a data center. You're willing to relinquish some control and pay for the service, so that you can leave operation details to the experts in that technology. They handle backups, hardware failures, upgrades, 24x7 operation, etc. You handle your application and your business.
I recommend you subscribe to and read the High Scalability blog, especially some of the most visited posts such as those about the architecture of various large sites, as you will learn a lot from it that may help you make a decision. There is no hard rule as to when you should or should not use a cloud service or move from a relational database to a keyvalue system like BigTable.
One upside of cloud services in any case is that if you build your application with them, it will be immediately scalable and require much less rework later on if you require that kind of performance. However, in view of premature optimisation, it would be wise to be sure that you need that kind of scalability before you decide to build your app on such a platform.
There are several concepts to wrap your head around when using a datastore system like BigTable as well, such as not being able to just slam out writes like you would in a relational database, and having to precalculate a lot of your data rather than just doing that based on info from the database.
Although again, you can learn a lot from reading the abovementioned blog and related posts about Youtube, Plentyoffish, Google, etc.
You say you are "currently writing a large scale ASP.NET app". If you have made significant progress on it, you are already pass the point where you can justify using Google app engine or Azure. Both require significantly different architectures than you have build with a traditional application due to language support, database differences, and maturity.
Google App Engine is Python only so switching to it would require a complete rewrite
Big table is not a relational database and requires very different coding patters. SQL Data Services originally announced to be non-relational as well, but is moving to be more relational. I have not seen how close to a standard MSSQL database it currently is.
I would consider Google app engine to be a relatively immature platform so far. Database functionality is limited, you cannot run background processes, profiling and performance tuning tools are limited at best. Azure is currently in limited community preview, and so is not even available to ship a product on today.
While there are many very valid reasons to use a cloud architecture, moving to it will require significantly different architectures. Think about what effect changing that architecture (and possibly waiting for platform availability) will do to your release date.
If you are early in your project, cloud vs. not cloud is a great question to ask. If you have well on your way, I think that the importance of getting to shipping code and leveraging the work you have already put in should trump any benefits to the cloud you may see.

Does anyone use Virtualization to create a quicker disaster recovery of a development environment?

I'm getting pretty tired of my development box dying and then I end up having to reinstall a laundry list of tools that I use in development.
This time I think I'm going to set the development environment up on a Virtual Box VM and save it to an external HDD so that way I can bring the development environment back up quickly after I fix the real computer.
It seems to be like a good way to make a "hardware agnostic backup" and be able to get back up to speed quickly after a disaster.
Has anybody tried this? How well did it work? Did it save you time?
I used to virtualize all my development eviroments using VirtualBox.
Basically, i have a Debian vbox image file stamped in a DVD. When i have a new project i copy it to one of my external hdds and customize it to my project.
Once my project was delivery, then i copy the image from my external hdd to a blank DVD and file it.
I've done this with good success, we had this in our QA environment even and we'd also make use of Undo disks, so that if we want to test for example Microsoft patches we could roll the box back to it's previous state.
The only case we had issues was on SQL Server's particullary if you do a lot of disk activity. We had two VM's replicating gigs of data btw each other hosted on the same physical box. The disks just couldn't keep up; however, for all the other tiers it worked like a breeze.
One cool idea I just saw a presentation on is using VirtualBox, and have your host using OpenSolaris with ZFS. That makes it easy to take a snapshot of your image(s), and rollback to the snapshot when things go wrong, or when you want to restore to a known state for QA purposes.
I keep all development on virtual machines. In a multi-developer shop this allows for rapid deployment of a new development environment if someone fries their VM (via service pack or whatever) and allows a new developer to join the project almost immediately.
K
I'm reading the question much differently than the rest of you guys. I read it as the OP asking about keeping an image of a fresh install as a VM, then, when a server needs to be redeployed, you can restore from a backup of the VM.
In this case, the VM is nothing more than a different way of maintaining an image of an OS install, and if it works, it's not a half bad idea, IMO.
In the companies I work with, I encourage the use of network installable operating systems. With the right up-front work you can configure a boot server on your office network which will install your base operating system, all the drivers you need for your hardware, and all the software you'll use. Not only will this bail you out in a disaster scenario where you lose a machine, but it makes deploying hardware for new employees trivial.
This is easier with Linux than it is with Windows or Mac, but the latter two can work in this manner too.
I use the same network install methods for deploying servers in a live environment too.
The Virtualisation approach isn't a bad answer to the same problem, but to me it doesn't seem quite as clean.
That's not the way to go.
When you are developing you want to have many tools, some which require a lot of computing power.Keep in mind that (IIRC, I couldn't find it on VBox website ) only emulates a PIV.
At the moment only one VM simulates a dual core CPU, and that's very new. This is important because there are race conditions that can only be seen on multiple CPU machines, so you want to test your code under multiple CPU/cores.
I think a simpler and better thing to do is make a disk image of your system and configuration partitions, restore it once a month to keep a clean system, and restore it
when ever your system gets mussed.
Now a quick word about Windows, since the other systems where I have done this are no problem. The partitions that you image, should not be changed in between. Not a problem
for other OS's, but some briliant person decided to put Profiles on Windows smack dab in the system files. I simply make it a point to not put anything in my Profile (or on my Desktop which is in my Profile ) that I'm not willing to lose.