How can I get a cumulated count graph with statsd and graphite? - configuration

I'm using statsd (the latest version from the git master branch) with graphite (0.9.10) as the backend.
In my (Django) code I call statsd.incr("signups") when a user signs up. In graphite's web interface, I now see a beautiful graph showing the number of signups per second under Graphite/stats/signups. When I look at the graph under Graphite/stats_counts/signups, I expect to see the total number of signups, but it looks like it's the number of signups per 10s interval (that's statsd's refresh interval, I guess).
I did configure storage-aggregation.conf, perhaps I got it wrong somehow? Also, I stopped carbon (not with stop, but really killed it, as apparently just stopping it doesn't allow it to reload the configuration). I also deleted the /opt/graphite/storage/whisper/stats_counts directory. Then I restarted the carbon daemon. I still get the number of signups per 10s interval. :-(
Here's my configuration:
# /opt/graphite/conf/storage-aggregation.conf
[lower]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min
[upper]
pattern = \.upper$
xFilesFactor = 0.1
aggregationMethod = max
[upper_90]
pattern = \.upper_90$
xFilesFactor = 0.1
aggregationMethod = max
[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[count_ps]
pattern = \.count_ps$
xFilesFactor = 0
aggregationMethod = sum
[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum
[sum_90]
pattern = \.sum_90$
xFilesFactor = 0
aggregationMethod = sum
[stats_counts]
pattern = ^stats_counts\.
xFilesFactor = 0
aggregationMethod = sum
[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max
[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average
And this:
# /opt/graphite/conf/storage-schemas.conf
[stats]
priority = 110
pattern = ^stats.*
retentions = 10s:6h,1m:7d,10m:1y
I'm starting to think that I did everything right and that Graphite is really doing what it's supposed to be doing. So the question is:
What's the proper way to configure statsd & graphite to draw the total number of signups since the beginning of time?
I guess I could change my Django code to count the total number of users, once in a while, and then use a gauge instead of an incr, but it feels like graphite should be able to just sum up whatever it receives, on the fly, not just when it aggregates data.
Edit:
Using Graphite's web interface, in the Graphite composer, I applied the integral function to the basic "signups per second" graph (in Graphite/stats/signups), and I got the desired graph (i.e. the total number of signups). Is this the appropriate way to get a cumulated graph? It's annoying because I need to select the full date range since the beginning of time, I cannot zoom into the graph, or else I just get the integral of the zoomed part. :-(

Yes the integral() function is the correct way of doing this. Since StatsD is stateless in that regard (all collected data is reset/deleted after the flush to Graphite occurs) there is no way for it to be able to sum up all received data since a certain point.
From the Graphite documentation of the integral() function:
This will show the sum over time, sort of like a continuous addition function. Useful for finding totals or trends in metrics that are collected per minute.

Related

How to save the model weights after running train_detector in mmdetection?

cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 600
# Change the evaluation metric since we use customized dataset.
cfg.evaluation.metric = 'bbox'
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 3
# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 3
# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)
cfg.load_from = 'gdrive/My Drive/mmdetection/checkpoints/vfnet_r50_fpn_mdconv_c3-
c5_mstrain_2x_coco_20201027pth-6879c318.pth'
cfg.work_dir = "../vinbig"
cfg.runner.max_epochs = 6
cfg.total_epochs = 6model = build_detector(cfg.model)
datasets = [build_dataset(cfg.data.train)]
train_detector(model, datasets[0], cfg, distributed=False, validate=True)
Now,my question is once I have finetuned the model on my custom dataset,how do i use it for testing? Where is the finetuned model stored?
At most of the places, the model is immediately used for testing but how do I save the finetuned model to be tested later on.
img = mmcv.imread('kitti_tiny/training/image_2/000068.jpeg')
model.cfg = cfg
result = inference_detector(model, img)
show_result_pyplot(model, img, result)
The above is what occurs mostly after the training phase. But that is because the model is already in runtime.How can i create my own mmdetection model checkpoint?
I have been working on Google colab.
Well, I don't know how to do it manually, but your checkpoints are automatically
saved in cfg.work_dir = "../vinbig". There you can find 'latest.pth' file as your final checkpoint.
Took me a while to find, because the documentation in mmdet.core.evaluation.eval_hooks is not very clear, but the old version at their readthedocs describes a save_best attribute to the EvalHook
save_best (str, optional): If a metric is specified, it would measure
the best checkpoint during evaluation. The information about best
checkpoint would be save in best.json.
Options are the evaluation metrics to the test dataset. e.g.,
``bbox_mAP``, ``segm_mAP`` for bbox detection and instance
segmentation. ``AR#100`` for proposal recall. If ``save_best`` is
``auto``, the first key will be used. The interval of
``CheckpointHook`` should device EvalHook. Default: None.
To enable it, you can just add the argument to the evaluation attribute in the config:
cfg.evaluation = dict(interval= 2, metric='mAP', save_best='mAP')
This will test the model on the validation set every 2 epochs and save the checkpoint that obtained the best mAP metric (in your case it might need to be bbox instead), in addition to every checkpoint indicated by the checkpoint_config.interval attribute.
:)
Edit: mmdet's EvalHook inherits from mmcv's EvalHook, where the documentation is complete.

anova_test not returning Mauchly's for three way within subject ANOVA

I am using a data set called sleep (found here: https://drive.google.com/file/d/15ZnsWtzbPpUBQN9qr-KZCnyX-0CYJHL5/view) to run a three way within subject ANOVA comparing Performance based on Stimulation, Deprivation, and Time. I have successfully done this before using anova_test from rstatix. I want to look at the sphericity output but it doesn't appear in the output. I have got it to come up with other three way within subject datasets, so I'm not sure why this is happening. Here is my code:
anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
I also tried to save it to an object and use get_anova_table, but that didn't look any different.
sleep_aov <- anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
get_anova_table(sleep_aov, correction = "GG")
This is an ideal dataset I pulled from the internet, so I'm starting to think the data had a W of 1 (perfect sphericity) and so rstatix is skipping this output. Is this something anova_test does?
Here also is my code using a dataset that does return Mauchly's:
weight_loss_long <- pivot_longer(data = weightloss, cols = c(t1, t2, t3), names_to = "time", values_to = "loss")
weight_loss_long$time <- factor(weight_loss_long$time)
anova_test(data = weight_loss_long, dv = loss, wid = id, within = c(diet, exercises, time))
Not an expert at all, but it might be because your factors have only two levels.
From anova_summary() help:
"Value
return an object of class anova_test a data frame containing the ANOVA table for independent measures ANOVA. However, for repeated/mixed measures ANOVA, it is a list containing the following components are returned:
ANOVA: a data frame containing ANOVA results
Mauchly's Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly's test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels.
Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values. "

Mean_squared_error output in function includes dtype and '0'

I want to calculate test statistics for a fb prophet forecast in a function because I want to average the test stats over different forecasts and cutoff points after using the fb-prophet cross_validation to get df_cv. I created a function that I apply to the dataframe after grouping by the cutoff points, in order to receive a measure per cutoff point. Then I calculate the mean over all these values.
The problem is that my function returns not only the value I am looking for but also a 0 as well as an information of the dtype. I can still do calculations with the returned value but when I want to plot etc. later it is very inconvenient. How can I strip these unnecessary values from the output?
def compute_avg_stats(df_cv,perf_measure):
measures = {'mse':mean_squared_error,'mae':mean_absolute_error,'mape':mean_absolute_percentage_error,'rmse':mean_squared_error}
performance_stats = {}
if perf_measure == 'rmse':
measure = np.sqrt(measures[perf_measure](y_true=df_cv['y'],y_pred=df_cv['yhat']))
else:
measure = measures[perf_measure](y_true=df_cv['yu'],y_pred=df_cv['yhat'])
return measure
df_cv.groupby('cutoff').apply(compute_avg_stats,perf_measure='rmse').to_frame().mean()
I think .mean() returns a Series. Try with .mean()[0]

How to get dataset into array

I have worked all the tutorials and searched for "load csv tensorflow" but just can't get the logic of it all. I'm not a total beginner, but I don't have much time to complete this, and I've been suddenly thrown into Tensorflow, which is unexpectedly difficult.
Let me lay it out:
Very simple CSV file of 184 columns that are all float numbers. A row is simply today's price, three buy signals, and the previous 180 days prices
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
This article: https://www.tensorflow.org/guide/datasets
It covers how to load pretty well. It even has a section on changing to numpy arrays, which is what I need to train and test the 'net. However, as the author says in the article leading to this Web page, it is pretty complex. It seems like everything is geared toward doing data manipulation, where we have already normalized our data (nothing has really changed in AI since 1983 in terms of inputs, outputs, and layers).
Here is a way to load it, but not in to Numpy and no example of not manipulating the data.
with tf.Session as sess:
sess.run( tf.global variables initializer())
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
line_count = 0
for row in csv_reader:
?????????
line_count += 1
I need to know how to get the csv file in to the
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
so that I can follow the tutorials to train and test the net.
It's not that clear for me your question. You might be answering, tell me if I'm wrong, how to feed data in your model? There are several fashions to do so.
Use placeholders with feed_dict during the session. This is the basic and easier one but often suffers from training performance issue. Further explanation, check this post.
Use queue. Hard to implement and badly documented, I don't suggest, because it's been taken over by the third method.
tf.data API.
...
So to answer your question by the first method:
# get your array outside the session
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
dataset = np.asarray([data for data in csv_reader])
close_col = dataset[:, 0]
signal_cols = dataset[:, 1: 3]
previous_cols = dataset[:, 3:]
# let's say you load 100 row each time for training
batch_size = 100
# define placeholders like you
...
with tf.Session() as sess:
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation, feed_dict={close: close_col[start: end, ],
signals: signal_col[start: end, ],
previous: previous_col[start: end, ]
}
)
By the third method:
# retrieve your columns like before
...
# let's say you load 100 row each time for training
batch_size = 100
# construct your input pipeline
c_col, s_col, p_col = wrapper(filename)
batch = tf.data.Dataset.from_tensor_slices((close_col, signal_col, previous_col))
batch = batch.shuffle(c_col.shape[0]).batch(batch_size) #mix data --> assemble batches --> prefetch to RAM and ready inject to model
iterator = batch.make_initializable_iterator()
iter_init_operation = iterator.initializer
c_it, s_it, p_it = iterator.get_next() #get next batch operation automatically called at each iteration within the session
# replace your close, signal, previous placeholder in your model by c_it, s_it, p_it when you define your model
...
with tf.Session() as sess:
# you need to initialize the iterators
sess.run([tf.global_variable_initializer, iter_init_operation])
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation)
Good luck!

Build network with shortcut using torch

I now have a network with 2 inputs X and Y.
X concatenates Y and then pass to network to get result1. And at the same time X will concat result1 as a shortcut.
It's easy if there is only one input.
branch = nn.Sequential()
branch:add(....) --some layers
net = nn.Sequential()
net:add(nn.ConcatTable():add(nn.Identity()):add(branch))
net:add(...)
But when it comes to two inputs I don't actually know how to do it? Besides, nngraph is not allowed.Does any one know how to do it?
You can use the table modules, have a look at this page: https://github.com/torch/nn/blob/master/doc/table.md
net = nn.Sequential()
triple = nn.ParallelTable()
duplicate = nn.ConcatTable()
duplicate:add(nn.Identity())
duplicate:add(nn.Identity())
triple:add(duplicate)
triple:add(nn.Identity())
net:add(triple)
net:add(nn.FlattenTable())
-- at this point the network transforms {X,Y} into {X,X,Y}
separate = nn.ConcatTable()
separate:add(nn.SelectTable(1))
separate:add(nn.NarrowTable(2,2))
net:add(separate)
-- now you get {X,{X,Y}}
parallel_XY = nn.ParallelTable()
parallel_XY:add(nn.Identity()) -- preserves X
parallel_XY:add(...) -- whatever you want to do from {X,Y}
net:add(parallel)
parallel_Xresult = nn.ParallelTable()
parallel_Xresult:add(...) -- whatever you want to do from {X,result}
net:add(parallel_Xresult)
output = net:forward({X,Y})
The idea is to start with {X,Y}, to duplicate X and to do your operations. This is clearly a bit complicated, nngraph is supposed to be here to do that.