Is there any alert available for cluster CPU utilization in ambari? - hadoop2

If yes then how to use that alert? And If it is not available then How to create an custom alert for getting cluster CPU utilization through ambari?

Ambari alerts are tied to services. Many services already have alerts for CPU utilization on nodes which run certain components. For instance Yarn has a ResourceManager CPU Utilization alert. This alert will be setup for any node running the YARN/RESOURCEMANAGER component. HBase has a similar alert HBase Master CPU Utilization which would be setup on nodes running HBASE/MASTER component.
It's not clear from your question what your cluster layout is nor how many nodes your cluster consists of. So I can't give you a definitive answer for your setup.
In general, if you had a component on every node in your cluster you could set up an alert for that component that monitors CPU Utilization. If you don't have a component on every node, then you would have to set up several such alerts across components to achieve what you want.
You can add or alter alerts via the Ambari UI by clicking the Alerts tab. In that view you can adjust alert notifications and alert settings by clicking the actions button and selecting the corresponding item from the drop down.

Related

Metrics Explorer - Disk Bytes Used unavailable

I am trying to set up a custom dashboard for my Compute Engine instances. One of the metrics that I want to report on is the amount of free disk space available on each VM. I noticed that "disk bytes used" is one of the available metrics but it is not actually available to me to select unless I disable the "Only Show Active" metrics.
I have the "OS Agent" (recently released) installed and running on the VMs.
I can't seem to find any documentation referencing this particular metric and how to get it working.
Has anyone tried this and figured out the magic solution?
Here is what I did in order to get the metrics working in a replicated environment:
1.-I created 2 GCE instances (Debian and RedHat).
Navigate to the Monitoring section, and select Dashboards.
3.- Select the VM Instances Dashboard from the Dashboard List.
4.- From the Instances section, I selected both instances and clicked on Install Agents; it will open the Cloud Shell VM and auto populate the command to install the Ops Agent.
5.- You might need to wait up to 10 minutes to get the agents connected to the Monitoring Dashboard.
6.- Once you see the Ops Agent running on the instances, select the Infrastructure Summary Dashboard.
7.- Scroll down the Dashboard, and you will see the Top Disk Used (Agent) section populated.
If you prefer, you can also create a custom Dashboard.
On the Left Panel, navigate to the Metrics Explorer section.
In the Resource type, select VM Instance (gce_instance), and, at the bottom, unselect the “Only show active” checkbox.
In the Metric dropdown, menu select Disk Usage, and also unselect the “Only show active” checkbox.
4.- You need to wait at least 1 minute to see the chart populated.
Here is the full list of metrics accepted for gce_compute

Better scaling Spring application using #Async for large number of database insert or update queries

I have a spring REST controller whose sole purpose is to create or update a record every time when mobile client launches or boot app. This URL will be fired only when user launches app or if it comes to foreground after resume ( ie, when user press device home button to something else and after a while, user press the app icon to bring it to the foreground from memory ).
The expected number of requests for this URL is around 600 requests per minute.
To scale this application, is it better to put the database (MySql) create / update logic of spring controller in a separate thread or using #Async feature of Spring ?
So that it won't hold the system port for a very long time and one machine can handle large number of requests before my web server ( glassfish ) pushes requests to the waiting queue.
Also,
The expected table size or the number of records in this table is around 10M - 30M.
I personally wouldn't bother with an async call at least to start with. Create a jmeter script and fire some load at it and see how it performs.
If you start to get slow down using Async with a threadPoolExecutor behind it (that you can easily configure) is certainly a valid option. With these type of things configuring the queue size and number of threads (both for your thread pool executor and your web container) is a bit of a black art which is where something like jmeter and a good profiling tool such as Yourkit come into their own.

GCE disk disappeared from europe-west1-a zone

I attempted to start a VM instance using a predefined disk in zone europe-west1-a. I have been using the disk for a number of weeks. The VM startup never completed (the start activity did not complete and the instance never appeared in the VM list - so presumably the VM failed to startup).
When I tried to start the VM a second time, the disk was no longer available. The disk is also not listed under the "Disks" tab of compute engine.
I have bronze support package, so can't create a ticket with google.
Any suggestions on what to do?
You should send a question about this using the grey "Send feedback" link at the bottom right of Developers Console page. This may require looking at the logs for your specific project/account and is not something that we can solve here on StackOverflow.

Changing number of replicas for an existent bucket in Couchbase?

I know is not possible with Couchbase 2.2, but is it possible to change the number of replicas on 2.5 ?
Thank you
Yes, you can change the replica count in 2.5 from the web console. The steps are listed below.
Click on the Data Buckets link.
Click the arrow to the left of the bucket name to expand the bucket details.
Click the Edit button.
The replica count appears in the Replicas section. Change the quantity there.
Click Save.
Click the Server Nodes link. After a short time (refresh if necessary), you will see a red message indicating that a rebalance is required. Rebalance your cluster from the button on that page. A rebalance is required so that Couchbase can distribute the new set of replica documents across the cluster.
You can also find info about a 'working but not officially supported' way to change the settings in 2.2 at https://groups.google.com/forum/#!topic/couchbase/ClqBDavQIkk.

Can a webserver determine if its the active node of an HA failover system without hard coding anything on the server itself?

I can think of a few hacks using ping, the box name, and the HA shared name but I think that they are leading to data leakage.
Should a box even know its part of an HA cluster or what that cluster name is? Is this more a function of DNS? Is there some API exposed for boxes to join an HA cluster and request the id of the currently active node?
I want to differentiate between the inactive node and active node in alerting mechanisms for a running program. If the active node is alerting I want to hit a pager and on the inactive node I want to send an email. Pushing the determination into the alerting layer moves the same problem elsewhere.
EASY SOLUTION: Polling the server from an external agent that connects through the network makes any shell game of who is the active node a moot point. To clarify this the only thing that will page is the remote agent monitoring the real. Each box can send emails all day long for all I care.
It really depends on the HA system you're using.
For example, if your system uses a shared IP and the traffic is managed by some hardware box, then it can be hard to determine if a certain box is a master or slave. That will depend on a specific solution really... As long as you can add a custom script to the supervisor, you should be ok - for example the controller can ping a daemon on the master server every second. In the alerting script, simply check if the time of the last ping < 2 sec...
If your system doesn't have a supervisor / controller node, but each node tries to determine the state itself, you can have more problems. If a split brain occurs, you can end up with both slaves or both masters, so your alerting software will be wrong in both cases. Gadgets that can ensure only one live node (STONITH and others) could help.
On the other hand, in the second scenario, if the HA software works on both hosts properly, you should be able to obtain the master/slave information straight from it. It has to know its own state at any time, because it's one of its main functions. In most HA solutions you should be able to either get the current state, or add some code to run when the state changes. Heartbeat offers both.
I wouldn't worry about the edge cases like a split brain though. Almost any situation when you lose connection between the clustered nodes will be more important than the stuff that happens on the separate nodes :)
If the thing you care about is really logging / alerting only, then ideally you could have a separate logger box which gets all the information about the current network / cluster status. External box will probably have better idea how to deal with the situation. If your cluster gets dos'ed / disconnected from the network / loses power, you won't get any alert. A redundant pair of independent monitors can save you from that.
I'm not sure why you mentioned DNS - due to its refresh time it shouldn't be a source of any "real-time" cluster information.
One way is to get the box to export it's idea of whether it is active into your monitoring. From there you can predicate paging/emailing on this status (with a race condition around failover), and alert on none/too many systems believing they are active.
Another option is to monitor the active system via a DNS alias (or some other method to address the active system) and page on that. Then also monitor all the systems, both active and inactive, and email on that. This will cause duplicate alerts for the active system, but that's probably okay.
It's hard to be more specific without knowing more about your setup.
As a rule, the machines in a HA cluster shouldn't really know which one is active. There's one exception, mind, and that's with cronjobs. At work, we have a HA cluster on top of which some rather important services run. Some of those use services have cronjobs, and we only want them running on the active box. To do that, we use this shell script:
#!/bin/sh
HA_CLUSTER_IP=0.0.0.0
if ip addr | grep $HA_CLUSTER_IP >/dev/null; then
eval "$#"
fi
(Note that this is running on Debian.) What this does is check to see if the current box is the active one within the cluster (replace 0.0.0.0 with the external IP of your HA cluster), and if so, executes the command passed in as arguments to the script. This ensures that one and only one box is ever actually executing the cronjobs.
Other than that, there's really no reasons I can think of why you'd need to know which box is the active one.
UPDATE: Our HA cluster uses Heartbeat to assign the cluster's external IP address as a secondary address to the active machine in the cluster. Programmatically, you can check to see if your machine is the current active box by calling gethostbyname(), and iterating over the data returned until you either get to the end or you find the cluster's IP in the list.
Without hard-coding.... ? I assume you mean some native heartbeat query, not sure. However, you could use ifconfig, HA creates a virtual interface on whatever interface it is configured to run on. For instance if HA was configured on eth0 then it would create a virtual interface of eth0:0, but only on the active node.
Therefore you could do a simple query of the ifconfig output to determine if the server twas the active node or not, for example if eth0 was the configured interface:
ACTIVE_NODE=`ifconfig | grep -c 'eth0:0'`
That will set the $ACTIVE_NODE variable to 1 (for active) and 0 (if standby). Hope that may help.
http://www.of-networks.co.uk