What's a best practice sampling rate for Perfmon? - perfmon

What's an acceptable sampling rate with Perfmon? Obviously, the more often we sample, the more our performance sampling has an effect on the performance on the machine. I'm hoping someone out there has a good rule of thumb for such a thing.
Evidence and statistics would be even better, but I'd be happy with generally accepted best practices.

The sample rate depends of course on the time you sample and the time you want to save the data and work with it later on.
Brent Ozar who has a pretty nice video about configuring PerfMon best practices using it to monitor sql and hardware performance. He recommends 15 seconds sample rate. For long time statistics.
If you use Perfmon on a daily basis running for 6 hours it will generate nearly 1,5k of samples which is fine for me.
Brent Ozar Perfmon
If you are chasing a critical issue which is happening right now, you will lower the rate of course. But for overall collection of data a rate of 15 seconds which means 4 per minute is working well.
It always depends on what you monitor. Vmware's Suggestions to monitor virtualisation
are the following :
problem occurs hourly : sample rate 5 seconds
problem occurs daily : sample rate 90 seconds
problem occurs weekly : sample rate 15 minutes

Related

YOLOv5 decreasing inference speed

I am using YOLOv5x model on my custom dataset. Inference time is initially 0.055s, then it increases up to 2 seconds gradually. Same thing happens in the validation too. Iterations start from 6 seconds and end as much as 34 seconds.
This performance drop happens in every training setting so I don't think it is about the dataset. I can train it without performance drop in the ssh server.
My current gpu is RTX 2070. I have 16gb ram and i7-9750h cpu.
edit:
If I split images into small parts and wait between the inferences, I get optimal performance. Also, If I run detect for the same part without waiting, I get worse inference time for the same images.
It was because of the thermal throttling. Cleaning and new thermal paste solved the problem. You can also see the original answer from the GitHub page.

How often does Google Cloud Preemptible instances preempt (roughly)?

I see that Google Cloud may terminate preemptible instances at any time, but have any unofficial, independent studies been reported, showing "preempt rates" (number of VMs preempted per hour), perhaps sampled in several different regions?
Given how little information I'm finding (as with similar questions), even anecdotes such as: "Looking back the past 6 months, I generally see 3% - 5% instances preempt per hour in uswest1" would be useful (I presume this can be monitored similarly to instance count metrics in AWS).
Clients occasionally want to shove their existing, non-fault-tolerant code in the cloud for "cheap" (despite best practices), and without having an expected rate of failure, they're often blind-sighted by the cheapness of preemptible, so I'd like to share some typical experiences of the GCP community, even if people's experiences may vary, to help convey safe expectations.
Thinking about “unofficial, independent studies” and “even anecdotes such as:” “Clients occasionally want to shove their existing, non-fault-tolerant code in the cloud for "cheap"” it ought to be said that no one architect or sysadmin in right mind would place production workloads with defined SLA into an execution environment without SLA. Hence the topic is rather speculative.
For those who is keen, Google provides preemption rate expectation:
For reference, we've observed from historical data that the average
preemption rate varies between 5% and 15% per day per project, on a
seven-day average, occasionally spiking higher depending on time and
zone. Keep in mind that this is an observation only: Preemptible
instances have no guarantees or SLAs for preemption rates or
preemption distributions.
Besides that there is an interesting edutainment approach to the task of "how to make inapplicable applicable".

LoadRunner Truclient vs protocol Http/Html

i'll try to compare the same script done in Http/html with TruClient. In both of the scenarios, it has same think time/wait time, same number of vusers, same pacing.
Is it possible that they have approximately same time for each transactions but they are so different in term of total number of passed transactions?
Ty in advance
In web HTTP/HTMl protocol, Response time = Processing time + Latency (time taken by network while transferring data).
In Truclient protocol, Response time = Processing time + Latency + Rendering time
Hence you will found a difference between both response times.
And execution times will differ in both protocols, hence total number of passed transactions also vary.
The question comes on what are you trying to measure? Are you trying to measure the response time of your servers or are you trying to measure the weight of the client in the impact of response time? I put forward the hypothesis that it is possible to measure the client weight by examination of the times captured with the developer tools in both development and also in functional testing.
So much of this client weight is related to page architecture that if you are waiting for performance testing to show you that your page architecture is problematic then you likely will not have time to fix the issues and retest before going to production.
I also recommend the collected O'Reilly works of Steve Souders which will help to bring home the client bound concepts for page design and how much this impacts the end user experience over and above how fast the server responds.
http://www.oreilly.com/pub/au/2951

Stress test cases for web application

What other stress test cases are there other than finding out the maximum number of users allowed to login into the web application before it slows down the performance and eventually crashing it?
This question is hard to answer thoroughly since it's too broad.
Anyway many stress tests depend on the type and execution flow of your workload. There's an entire subject dedicated (as a graduate course) to queue theory and resources optimization. Most of the things can be summarized as follows:
if you have a resource (be it a gpu, cpu, memory bank, mechanical or
solid state disk, etc..), it can serve a number of users/requests per
second and takes an X amount of time to complete one unit of work.
Make sure you don't exceed its limits.
Some systems can also be studied with a probabilistic approach (Little's Law is one of the most fundamental rules in these cases)
There are a lot of reasons for load/performance testing, many of which may not be important to your project goals. For example:
- What is the performance of a system at a given load? (load test)
- How many users the system can handle and still meet a specific set of performance goals? (load test)
- How does the performance of a system changes over time under a certain load? (soak test)
- When will the system will crash under increasing load? (stress test)
- How does the system respond to hardware or environment failures? (stress test)
I've got a post on some common motivations for performance testing that may be helpful.
You should also check out your web analytics data and see what people are actually doing.
It's not enough to simply simulate X number of users logging in. Find the scenarios that represent the most common user activities (anywhere between 2 to 20 scenarios).
Also, make sure you're not just hitting your cache on reads. Add some randomness / diversity in the requests.
I've seen stress tests where all the users were requesting the same data which won't give you real world results.

Perfmon - Refresh rate of power meter

I'm writing a tool to collect information about power consumption of notebooks. I need to measure the current power consumption, and I use Perfmon to do so. But I found a strange bug.
Here is the typical graph of power consumption (this is "Power Meter" - "Power" - "_Total"):
Measurements are updated about once every 10-15 seconds.
But if the run Everest (or AIDA64) Power Management tab will be updating this more often, the results are more accurate:
Measurements are updated about once every 1-2 seconds.
I do not understand what happens when we run Everest. I really need to get accurate data.
Do you have any ideas?
I would really appreciate any suggestions in this regard.