how to create auto scale rule for HttpQueueLength with az monitor - azure-cli

I can create basic auto sacle rule such as memory easily
az monitor autoscale rule create -g xx --autoscale-name xxxx --scale out 2 --condition "CpuPercentage > 60 avg 1m"
But I want to create a rule base on Http Queue Length or Http5xx error, then az monitor command throws error
az monitor autoscale rule create -g xx --autoscale-name xxxx --scale out 2 --condition "HttpQueueLength <= 1 sum 1m"
az monitor autoscale rule create -g xx --autoscale-name xxxx --scale out 2 --condition "Http5xx <= 50 sum 1m"
Error is:
--condition ["NAMESPACE"] METRIC {==,!=,>,>=,<,<=} THRESHOLD {avg,min,max,total,count} PERIOD
[where DIMENSION {==,!=} VALUE [or VALUE ...]
[and DIMENSION {==,!=} VALUE [or VALUE ...] ...]]
Can someone guide me?

Sum – the sum of all values captured over the aggregation interval. Sometimes referred to as the Total aggregation.
Count – the number of measurements captured over the aggregation interval. Count doesn't look at the value of the measurement, only the number of records.
Total tends to refer to the total for a given criteria (total requests, total jobs, etc.)
Refer supported metrics link , and Explained metrics for further more information.

Related

How can I get a kernel's execution time with NSight Compute 2019 CLI?

Suppose I have an executable myapp which needs no command-line argument, and launches a CUDA kernel mykernel. I can invoke:
nv-nsight-cu-cli -k mykernel myapp
and get output looking like this:
==PROF== Connected to process 30446 (/path/to/myapp)
==PROF== Profiling "mykernel": 0%....50%....100% - 13 passes
==PROF== Disconnected from process 1234
[1234] myapp#127.0.0.1
mykernel(), 2020-Oct-25 01:23:45, Context 1, Stream 7
Section: GPU Speed Of Light
--------------------------------------------------------------------
Memory Frequency cycle/nsecond 1.62
SOL FB % 1.58
Elapsed Cycles cycle 4,421,067
SM Frequency cycle/nsecond 1.43
Memory [%] % 61.76
Duration msecond 3.07
SOL L2 % 0.79
SM Active Cycles cycle 4,390,420.69
(etc. etc.)
--------------------------------------------------------------------
(etc. etc. - other sections here)
so far - so good. But now, I just want the overall kernel duration of mykernel - and no other output. Looking at nv-nsight-cu-cli --query-metrics, I see, among others:
gpu__time_duration incremental duration in nanoseconds; isolated measurement is same as gpu__time_active
gpu__time_active total duration in nanoseconds
So, it must be one of these, right? But when I run
nv-nsight-cu-cli -k mykernel myapp --metrics gpu__time_duration,gpu__time_active
I get:
==PROF== Connected to process 30446 (/path/to/myapp)
==PROF== Profiling "mykernel": 0%....50%....100% - 13 passes
==PROF== Disconnected from process 12345
[12345] myapp#127.0.0.1
mykernel(), 2020-Oct-25 12:34:56, Context 1, Stream 7
Section: GPU Speed Of Light
Section: Command line profiler metrics
---------------------------------------------------------------
gpu__time_active (!) n/a
gpu__time_duration (!) n/a
---------------------------------------------------------------
My questions:
Why am I getting "n/a" values?
How can I get the actual values I'm after, and nothing else?
Notes: :
I'm using CUDA 10.2 with NSight Compute version 2019.5.0 (Build 27346997).
I realize I can filter the standard output stream of the unqualified invocation, but that's not what I'm after.
I actually just want the raw number, but I'm willing to settle for using --csv and taking the last field.
Couldn't find anything relevant in the nvprof transition guide.
tl;dr: You need to specify the appropriate 'submetric':
nv-nsight-cu-cli -k mykernel myapp --metrics gpu__time_active.avg
(Based on #RobertCrovella's comments)
CUDA's profiling mechanism collects 'base metrics', which are indeed listed with --list-metrics. For each of these, multiple samples are taken. In version 2019.5 of NSight Compute you can't just get the raw samples; you can only get 'submetric' values.
'Submetrics' are essentially some aggregation of the sequence of samples into a scalar value. Different metrics have different kinds of submetrics (see this listing); for gpu__time_active, these are: .min, .max, .sum, .avg. Yes, if you're wondering - they're missing second-moment metrics like the variance or the sample standard deviation.
So, you must either specify one or more submetrics (see example above), or alternatively, upgrade to a newer version of NSight Compute, with which you actually can just get all the samples apparently.

Created factors with EFA, tried regressing (lm) with control variables - Error message "variable lengths differ"

EFA first-timer here!
I ran an Exploratory Factor Analysis (EFA) on a data set ("df1" = 1320 observations) with 50 variables by creating a subset with relevant variables only that have no missing values ("df2" = 301 observations).
I was able to filter 4 factors (19 variables in total).
Now I would like to take those 4 factors and regress them with control variables.
For instance: Factor 1 (df2$fa1) describes job satisfaction.
I would like to control for age and marital status.
Fa1Regression <- lm(df2$fa1 ~ df1$age + df1$marital)
However I receive the error message:
Error in model.frame.default(formula = df2$fa1 ~ df1$age + :
variable lengths differ (found for 'df1$age')
What can I do to run the regression correctly? Can I delete observations from df1 that are nonexistent in df2 so that the variable lengths are the same?
Its having a problem using lm to regress a latent factor on other coefficients. Instead, use the lavaan package, where your model statement would be myModel<- 'df2$fa1~ x1+x2+x3'

Understanding nagios check_ping plugin output

Trying to understand the output of Nagios Plugin check_ping:
root#debian:/usr/local/nagios/libexec# ./check_ping -H 192.168.234.135 -w 10,1% -c 20,10%
PING CRITICAL - Packet loss = 60%, RTA = 1.26 ms|rta=1.256000ms;10.000000;20.000000;0.000000 pl=60%;1;10;0
root#debian:/usr/local/nagios/libexec#
As per the documentation, the performance data is displayed after the pipe '|' symbol. I am trying to understand the fourth parameter for each category / section. In the above example, what does 0.000000 and 0 stands for?
First, here is the official documentation: Nagios Plugin Development Guidelines
Second, the numbers returned from each metric (in order) is:
value [with unit of measurement], warning threshold, critical threshold, minimum value, maximum value
In your case, only the warning,critical and minimum are reported after the perfdata itself. So to answer your question, those numbers are the minimum value (possible).

Hive query does not begin MapReduce process after starting job and generating Tracking URL

I'm using Apache Hive.
I created a table in Hive (similar to external table) and loaded data into the same using the LOAD DATA LOCAL INPATH './Desktop/loc1/kv1.csv' OVERWRITE INTO TABLE adih; command.
While I am able to retrieve simple data from the hive table adih (e.g. select * from adih, select c_code from adih limit 1000, etc), Hive gives me errors when I ask for data involving slight computations (e.g. select count(*) from adih, select distinct(c_code) from adih).
The Hive cli output is as shown in the following link -
hive> select distinct add_user from adih;
Query ID = latize_20161031155801_8922630f-0455-426b-aa3a-6507aa0014c6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1477889812097_0006, Tracking URL = http://latize-data1:20005/proxy/application_1477889812097_0006/
Kill Command = /opt/hadoop-2.7.1/bin/hadoop job -kill job_1477889812097_0006
[6]+ Stopped $HIVE_HOME/bin/hive
Hive stops displaying any further logs / actions beyond the last line of "Kill Command"
Not sure where I have gone wrong (many answers on StackOverflow tend to point back to YARN configs (environment config detailed below).
I have the log as well but it contains more than 30000 characters (Stack Overflow limit)
My hadoop environment is configured as follows -
1 Name Node & 1 Data Node. Each has 20 GB of RAM with sufficient ROM. Have allocated 13 GB of RAM for the yarn.scheduler.maximum-allocation-mb and yarn.nodemanager.resource.memory-mb each with the mapreduce.map.memory.mb being set as 4 GB and the mapreduce.reduce.memory.mb being set as 12 GB. Number of reducers is currently set to default (-1). Also, Hive is configured to run with a MySQL DB (rather than Derby).
You should set the appropriate values to the properties show in your trace,
eg: Edit the properties in hive-site.xml
<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>67108864</value></property>
Looks like you have set mapred.reduce.tasks = -1, which makes Hive refer to its config to decide the number of reduce tasks.
You are getting an error as the number of reducers is missing in Hive config.
Try setting it using below command:
Hive> SET mapreduce.job.reduces=XX
As per official documentation: The right number of reduces seems to be 0.95 or 1.75 multiplied by (< no. of nodes > * < no. of maximum containers per node >).
I managed to get Hive and MR to work - increased the memory configurations for all the processes involved:
Increased the RAM allocated to YARN Scheduler and maximum RAM allocated to the YARN Nodemanager (in yarn-site.xml), alongside increasing the RAM allocated to the Mapper and Reducer (in mapred-site.xml).
Also incorporated parts of the answers by #Sathiyan S and #vmorusu - set the hive.exec.reducers.bytes.per.reducer property to 1 GB of data, which directly affects the number of reducers that Hive uses (through application of its heuristic techniques).

How to Display one variable and one constant value in zabbix trigger

I would like to create a zabbix trigger with the available memory and total memory for a particular host. How is this possible with one trigger.My requirement is to have both available memory and the total memory listed in the action mail.
For example, you can use a trigger like this to check that available memory is less than 10% of the total:
{Template OS Linux:vm.memory.size[available].max(#3)} <
0.1 * {Template OS Linux:vm.memory.size[total].last()}
In the action email, you can then reference item names, keys, and values like so:
Item values:
1. {ITEM.NAME1} ({HOST.NAME1}:{ITEM.KEY1}): {ITEM.VALUE1}
2. {ITEM.NAME2} ({HOST.NAME2}:{ITEM.KEY2}): {ITEM.VALUE2}
3. {ITEM.NAME3} ({HOST.NAME3}:{ITEM.KEY3}): {ITEM.VALUE3}
In the example above, {ITEM.KEY1} refers to vm.memory.size[available] and {ITEM.KEY2} refers to vm.memory.size[total]. Similarly for other macros. {ITEM.KEY3} will expand to *UNKNOWN*, because there is no third item in the trigger expression.
Such an email format comes with Zabbix 2.2 and Zabbix 2.4 by default.