Jmeter jp#gc - PerfMon Metrics Collector : Analysing issue - perfmon

I am unable to understand the result of the performance metrics generated in Jmeter. Can anyone please help me undersand this results. Thanks in advance.
enter image description here

Every Performance Metrics uses different scale. For Example Memory utilization can be calculated with % usage as well as in Mbs.
I would suggest you look each utilization individual for better understanding.
To do that, you can select Rows -> Check only one (for example memory) -> then see the graph of it coming back to the Chart again.
Also in settings option you can edit 'force maximum Y axis value', For ex- 100 (To calculate in percentage).
Same can be applied for other matrices.
Like if you agree.
Thanks 😉

Related

Regression problem getting much better results when dividing values by 100

I'm working on a regression problem in pytorch. My target values can be either between 0 to 100 or 0 to 1 (they represent % or % divided by 100).
The data is unbalanced, I have much more data with lower targets.
I've noticed that when I run the model with targets in the range 0-100, it doesn't learn - the validation loss doesn't improve, and the loss on the 25% large targets is very big, much bigger than the std in this group.
However, when I run the model with targets in the range 0-1, it does learn and I get good results.
If anyone can explain why this happens, and if using the ranges 0-1 is "cheating", that will be great.
Also - should I scale the targets? (either if I use the larger or the smaller range).
Some additional info - I'm trying to fine tune bert for a specific task. I use MSEloss.
Thanks!
I think your observation relates to batch normalization. There is a paper written on the subject, an numerous medium/towardsdatascience posts, which i will not list here. Idea is that if you have a no non-linearities in your model and loss function, it doesn't matter. But even in MSE you do have non-linearity, which makes it sensitive to scaling of both target and source data. You can experiment with inserting Batch Normalization Layers into your models, after dense or convolutional layers. In my experience it often improves accuracy.

Huge number of range scans in geomesa cassandra

I am trying to test geomesa cassandra backend.
I have ingested a ~2M points from OSM and send DWITHIN and BBOX queries to cassandra using geomesa with geotools ecql.
Then I've done some performance tests, the results do not look reasonable for me.
Cassandra is installed to linux machine with 16 cores xeon, 32GB RAM and 1 SSD drive. I got ~150 queries per second.
I started to investigate geomesa execution plan for my queries.
Trace logs coming from org.locationtech.geomesa.index.utils.Explainer were really helpful, they do a great job explaining what is going on.
What looks confusing to me is the number of range scans that go though cassandra.
For example, I see the following in my logs:
Table: osm_poi_a7_c_osm_5fpoi_5fa7_attr_v2
Ranges (49): SELECT * FROM ..
The number 49 means the actual number of range scans sent to cassandra.
Different queries give me different results, they vary approximately from ~10 to ~130.
10 looks quite reasonable to me but 130 looks enormous.
Could you please explain what causing geomesa to send so huge amount of range scans?
Is there any way to decrease the number of range scans?
Maybe there are some configuration options?
Are there other options? like descreasing the presicion of z-index to improve such queries?
Thanks anyway!
In general, GeoMesa uses common query planning algorithms among its various back-end implementations. The default values are tilted more towards HBase and Accumulo, which support scans with large numbers of ranges. However, there are various knobs you can use to modify the behavior.
You can reduce the number of ranges that are generated at runtime through the system property geomesa.scan.ranges.target (see here). Note that this will be a rough upper limit, so you will generally get more ranges than specified.
When creating your simple feature type schema, you can also disable sharding, which defaults to 4. The number of ranges generated will be multiplied by the number of shards. See here and here.
If you are querying multiple 'time bins' (weeks by default), then the number of ranges will be multiplied by the number of time bins you are querying. You can set this to a longer interval when creating your schema; see here.
Thanks,

Is standard deviation (STDDEV) the right function for the job?

We wrote a monitoring system. This monitor is made of agents. Each agent runs on a different server, and monitors that specific server resources (RAM, CPU, SQL Server Status, Replication Status, Free Disk Space, Internet Access, specific bussiness metrics, etc.).
The agents report every measure they take to a central database where these "observations" are stored.
For example, every few seconds an agent would store in the central database a specific bussiness metric called "unprocessed_files" with its corresponding value:
(unprocessed_files, 41)
That value is constanty being written to our DB (among many others, as explained above).
We are now implementing a client application, a screen, that displays the status of every thing we monitor. So, how can we calculate what's a "normal" value and what's a wrong value?
For example, we know that if our servers are working correctly, the unprocessed_files would always be close to 0, but maybe (We don't know yet), 45 is an acceptable value.
So the question is, should we use the Standard Deviation in order to know what the acceptable range of values is?
ACCEPTABLE_RANGE = AVG(value) +- STDDEV(value) ?
We would like to notify with a red color when something is not going well.
For your backlog (unprocessed file) metric, using a standard deviation to know when to sound an alarm (turn something red) is going to drive you crazy with false alarms.
Why? most of the time your backlog will be zero. So, the standard deviation will also be very close to zero. Standard deviation tells you how much your metric varies. Therefore, whenever you get a nonzero backlog, it will be outside the avg + stdev range.
For a backlog, you may want to turn stuff yellow when the value is > 1 and red when the value is > 10.
If you have a "how long did it take" metric, standard deviation might be a valid way to identify alarm conditions. For example, you might have a web request that usually takes about half a second, but typically varies from 0.25 to 0.8 second. If they suddenly start taking 2.5 seconds, then you know something has gone wrong.
Standard deviation is a measurement that makes most sense for a normal distribution (bell curve distribution). When you handle your measurements as if they fit a bell curve, you're implicitly making the assumption that each measurement is entirely independent of the others. That assumption works poorly for typical metrics of a computing system (backlog, transaction time, load average, etc). So, using stdev is OK, but not great. You'll probably struggle to make sense of stdev numbers: that's because they don't actually make much sense.
You'd be better off, like #duffymo suggested, looking at the 95th percentile (the worst-performing operations). But MySQL doesn't compute those kinds of distributions natively. postgreSQL does. So does Oracle Standard Edition and higher.
How do you determine an out-of-bounds metric? It depends on the metric, and on what you're trying to do. If it's a backlog measurement, and it grows from minute to minute, you have a problem to investigate. If it's a transaction time, and it's far longer than average (avg + 3 x stdev, for example, you have a problem. The open source monitoring system Nagios has worked this out for various kinds of metrics.
Read a book by N. N. Taleb called "The Black Swan" if you want to know how assuming the real world fits normal distributions can crash the global economy.
Standard deviation is just a way of characterizing how much a set of values spreads away from its average (i.e. mean). In a sense, it's an "average deviation from average", though a little more complicated than that. It is true that values which differ from the mean by many times the standard deviation tend to be rare, but that doesn't mean the standard deviation is a good benchmark for identifying anomalous values that might indicate something is wrong.
For one thing, if you set your acceptable range at the average plus or minus one standard deviation, you're probably going to get very frequent results outside that range! You could use the average plus or minus two standard deviations, or three, or however many you want to reduce the number of notifications/error conditions as low as you want, but there's no telling whether any of this actually helps you identify error conditions.
I think your main problem is not statistics. Your problem is that you don't know what kinds of results actually indicate an error. So before you program in any acceptable range, just let the system run for a while and collect some calibration data showing what kinds of values you see when it's running normally, and what kinds of values you see when it's not running normally. Make sure you have some way to tell which are which. Once you have a good amount of data for both conditions, you can analyze it (start with a simple histogram) and see what kinds of values are characteristic of normal operation and what kinds are characteristics of error conditions. Then you can set your acceptable range based on that.
If you want to get fancy, there is a statistical technique called likelihood ratio testing that can help you evaluate just how likely it is that your system is working properly. But I think it's probably overkill. Monitoring systems don't need to be super-precise about this stuff; just show a cautionary notice whenever the readings start to seem abnormal.

Finding images in RAM dump

Extracting screenshots from RAM dumps
Some classical security / hacking challenges include having to analyze the dump of the physical RAM of a system. volatility does a great job at extracting useful information, including wire-view of the windows displayed at the time (using the command screenshot). But I would like to go further and find the actual content of the windows.
So I'd like to reformulate the problem as finding raw images (think matrix of pixels) in a large file. If I can do this, I hope to find the content of the windows, at least partially.
My idea was to rely on the fact that a row of pixels is similar to the next one. If I find a large enough number of lines of the same size, then I let the user fiddle around with an interactive tool and see if it decodes to something interesting.
For this, I would compute a kind of spectrogram. More precise a heatmap where the shade show how likely it is for the block of data #x to be part of an image of width y bytes, with x and y the axis of the spectrogram. Then I'd just have to look for horizontal lines in it. (See the examples below.)
The problem I have right now is to find a method to compute that "kind of spectrogram" accurately and quickly. As an order of magnitude, I would like to be able to find images of width 2048 in RGBA (8192 bytes per row) in a 4GB file in a few minutes. That means processing a few tens of MB per second.
I tried using FFT and autocorrelation, but they do not show the kind of accuracy I'm after.
The problem with FFT
Since finding the length of a mostly repeating pattern looks like finding a frequency, I tried to use a Fourier transform with 1 byte = 1 sample and plot the absolute value of the spectrum.
But the main problem is the period resolution. Since I'm interested in finding the period of the signal (the byte length of a row of pixels), I want to plot the spectrogram with period length on the y axis, not the frequency. But the way the discrete Fourier transform work is that it computes the frequencies multiple of 1/n (for n data points). Which gives me a very low resolution for large periods and a higher-than-needed resolution for short periods.
Here is a spectrogram computed with this method on a 144x90 RGB BMP file. We expect a peak at an offset 432. The window size for the FFT was 4320 bytes.
And the segment plot of the first block of data.
I calculated that if I need to distinguish between periods k and k+1, then I need a window size of roughly k². So for 8192 bytes, that makes the FFT window about 16MB. Which would be way too slow.
So the FFT computes too much information I don't need and not enough information I would need. But given a reasonable window size, it usually show a sharp peak at about the right period.
The problem with autocorrelation
The other method I tried is to use a kind of discrete autocorrelation to plot the spectrogram.
More exactly, what I compute is the cross-correlation between a block of data and half of it. And only compute it for the offsets where the small block is fully inside the large block. The size of the large block has to be twice larger than the max period to plot.
Here is an example of spectrogram computed with this method on the same image as before.
And the segment plot of the autocorrelation of the first block of data.
Altough it produces just the right amount of data, the value of the autocorrelation change slowly, thus not making a sharp peak for the right period.
Question
Is there a way to get both a sharp peak and around the correct period and enough precision around the large periods? Either by tweaking the afformentioned algorithms or by using a completely different one.
I can't judge much about the FFT part. From the title ("Finding images in RAM dump") it seems you are trying to solve a bigger problem and FFT is only a part of it, so let me answer on those parts where I have some knowledge.
analyze the RAM dump of a system
This sounds much like physical RAM. If an application takes a screenshot, that screenshot is in virtual RAM. This results in two problems:
a) the screenshot may be incomplete, because parts of it are paged out to disk
b) you need to perform a physical address to virtual address mapping in order to bring the bytes of the screenshot into correct order
find raw images from the memory dump
To me, the definition of what a raw image is is unclear. Any application storing an image will use an image file format. Storing only the data makes the screenshot useless.
In order to perform an FFT on the data, you should probably know whether it uses 24 bit per pixel or 32 bit per pixel.
I hope to find a screenshot or the content of the current windows
This would require an application that takes screenshots. You can of course hope. I can't judge about the likeliness of that.
rely on the fact that a row of pixels is similar to the next one
You probably hope to find some black on white text. For that, the assumption may be ok. If the user is viewing his holiday pictures, this may be different.
Also note that many values in a PC are 32 bit (Integer, Float) and 0x00000000 is a very common value. Your algorithm may detect this.
images of width 2048
Is this just a guess? Or would you finally brute-force all common screen sizes?
in RGBA
Why RGBA? A screenshot typically does not have transparency.
With all of the above, I wonder whether it wouldn't be more efficient to search for image signatures like JPEG, BMP or PNG headers in the dump and then analyze those headers and simply get the picture from the metadata.
Note that this has been done before, e.g. WinDbg has some commands in the ext debugger extension which is loaded by default
!findgifs
!findjpegs
!findjpgs
!findpngs

wrong result in spatial data query sql server

Im trying to get familiar with spatial data using sql server 2008.
I use book: http://www.beginningspatial.com/
and there is sample data: http://www.census.gov/geo/cob/bdy/zt/z500shp/zt06_d00_shp.zip
I uploaded it to my sql server database and Spatial results are presented well. Everything seems to be ok but when I tried to calculate distance between 2 geometries result is really small.
For example 0.27
The same is if I try to calculate STLength of given region. Result also is very small like 0.19 for example.
I wonder why the results are so small. I used SRID 4269 importing data.
Does anyone have any idea why is it like that ? I think that the result whould be in meters.
Thanks for any advice
STLength returns units in the data source's unit of measure. In your case, you chose SRID 4326, which stores the data in latitude/longitude, therefore your base units are a "degree", not a standard linear unit like feet/meters. To calculate the distance in linear units, you should load your data as a projected format.
You might want to refer to this very similar question, as well.