I'm a newbie on using WEKA so could you explain me the following results I got from trying to train data using MultilayerPerceptron (Neural Network):
Also could you atleast give me some links that could help me understand this?
=== Run information ===
Scheme:weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -G -R
Relation: Dengue
Instances: 520
Attributes: 12
MinTemp
MaxTemp
MeanTemp
RelativeHumidity
Rainfall
Wind
LandArea
IncomeClass
WasteGenerated
PopulationDensity
HouseNumber
Dengue
Test mode:evaluate on training data
=== Classifier model (full training set) ===
Linear Node 0
Inputs Weights
Threshold 1.045699824540429
Node 1 -0.7885241220010747
Node 2 -0.5679021300029351
Node 3 -0.6990681220652758
Node 4 -1.7036399417988182
Node 5 -1.7986596505677839
Node 6 -1.0031026344357001
Sigmoid Node 1
Inputs Weights
Threshold -2.7846715622473632
Attrib MinTemp -0.3756262925227143
Attrib MaxTemp -1.0113362508935868
Attrib MeanTemp -0.6867107452689675
Attrib RelativeHumidity -1.357278537485863
Attrib Rainfall 0.9346189251054217
Attrib Wind -2.4697988150023895
Attrib LandArea -0.04802972345084459
Attrib IncomeClass -0.0023757695994812353
Attrib WasteGenerated -0.5219516258114455
Attrib PopulationDensity 0.6275856253232837
Attrib HouseNumber 0.4794517421072107
Sigmoid Node 2
Inputs Weights
Threshold -2.238113558499396
Attrib MinTemp 0.6634817443452294
Attrib MaxTemp 0.04177526569735764
Attrib MeanTemp 0.4213111516398967
Attrib RelativeHumidity 0.9477161615423007
Attrib Rainfall -0.06941110528380763
Attrib Wind 0.1398767209217198
Attrib LandArea 0.011908782901326666
Attrib IncomeClass -0.03177518077905532
Attrib WasteGenerated -2.111275394512881
Attrib PopulationDensity -0.002225384228836655
Attrib HouseNumber -0.18689477740073276
Sigmoid Node 3
Inputs Weights
Threshold -1.5469990007413668
Attrib MinTemp -0.538188914566223
Attrib MaxTemp 0.2452404814154855
Attrib MeanTemp -0.07155897171503904
Attrib RelativeHumidity -0.6490463479419373
Attrib Rainfall 1.2010399306686497
Attrib Wind 0.7275195821368675
Attrib LandArea -0.033472141554108756
Attrib IncomeClass 0.021303339082304765
Attrib WasteGenerated -0.12403826628027773
Attrib PopulationDensity -0.2663352902864381
Attrib HouseNumber 0.5153046727550502
Sigmoid Node 4
Inputs Weights
Threshold -1.3273158445760431
Attrib MinTemp -0.511476470658412
Attrib MaxTemp -1.4472764735477759
Attrib MeanTemp -0.992550007766579
Attrib RelativeHumidity -0.4889201348001783
Attrib Rainfall 4.777705232733897
Attrib Wind 1.0057960261924193
Attrib LandArea 0.01594686951090471
Attrib IncomeClass -0.012053049723794618
Attrib WasteGenerated -0.29397677127551647
Attrib PopulationDensity 0.8760275665744505
Attrib HouseNumber 0.26513119051179107
Sigmoid Node 5
Inputs Weights
Threshold 0.9085281334048771
Attrib MinTemp -2.3264253136843633
Attrib MaxTemp 4.342385678707546
Attrib MeanTemp 1.26274142914379
Attrib RelativeHumidity 0.3589371377240767
Attrib Rainfall -6.060544069949767
Attrib Wind -1.7001357028288409
Attrib LandArea -0.04696606932834255
Attrib IncomeClass -0.02765457448569584
Attrib WasteGenerated -4.685692052378084
Attrib PopulationDensity 0.7497806979087069
Attrib HouseNumber -1.817884131764966
Sigmoid Node 6
Inputs Weights
Threshold -2.343332128576834
Attrib MinTemp -1.7808827758329944
Attrib MaxTemp 2.3738961064086217
Attrib MeanTemp 0.6053466030736496
Attrib RelativeHumidity 0.4178221348007889
Attrib Rainfall 0.2646387686505043
Attrib Wind 0.6941590574632328
Attrib LandArea 0.022879267506905346
Attrib IncomeClass -0.030599400189594162
Attrib WasteGenerated 0.2341906598765536
Attrib PopulationDensity -0.054518515830522876
Attrib HouseNumber -0.6802930287343757
Class
Input
Node 0
Time taken to build model: 17.83 seconds
=== Evaluation on training set ===
=== Summary ===
Correlation coefficient 0.7747
Mean absolute error 1.477
Root mean squared error 1.9605
Relative absolute error 110.9364 %
Root relative squared error 86.4544 %
Total Number of Instances 518
Ignored Class Unknown Instances 2
You ran the Multilayer Perceptron (MLP) algorithm against the data. MLP uses backpropagation to classify instances. I'm going to assume you are familiar with basic statistics, the concept of backpropagation, and artificial neural networks since you chose this particular algorithm to train your model. If this is not the case, you have put the cart before the horse and need to learn the math before using this model. Here is a training presentation that may help you if this is the case.
After it says 'run information,' it shows the command you ran and all the parameters which you set (explained in Weka documentation - you chose them or at least went with defaults). After this it shows you are using the Dengue file (presumably data related to the fever and demographics of those infected, but since you chose this data I would presume you have a basic understanding of how it was collected and what the data is). Instances is the number of samples in the data file, and attributes is the number of columns.
The sigmoid nodes are the nodes used in backpropogation and the associated data. This is the network itself (its weights and attributes). The nodes in the hidden layer of this network are all sigmoid but the output nodes are linear units (eg. linear node 0 is your output unit and sigmoid nodes 1-6 are your six hidden units. All the values given are your interconnection weights. You can use them to manually calculate your results (which is done for you below the network).
As I just said, the bottom part is the final results calculated from the network. This part is all basic statistics so I won't elaborate any further.
Related
The goal is to train YOLO with multi-GPU. According to Darknet AlexeyAB, we should train YOLO with single GPU for 1000 iterations first, and then continue it with multi-GPU from saved weight (1000_iter.weigts). So, we don't need to change any parameters in .cfg file?
Here is my .cfg when I trained my model with single GPU:
[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
AlexyAB says: modify .cfg "if you get Nan". In my case, I'm not getting Nan, but my loss is fluctuating. Shouldn't we change anything when we continue training with multi-GPU? batch? subdivisions? learning_rate? burn_in? We just need to continue training with same configurations?
You will need to change burn_in, max_batches and steps between the two cases, for example, if your final target is 500200, your first .cfg file should have this:
burn_in=100
max_batches = 50000
policy=steps
steps=40000,45000
and the second file like this:
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
You need only to change learning_rate if you get a Nan according to this, then you should divide learning_rate by the number of GPUs and multiply burn_in by the same number.
I am new to CatBoost and I am running CatBoostClassifier training with logging_level = "Info". My data consists of both categorical and numerical variables.
Firstly, for one of the categorical variables, I get the following message in the printed info: feature 21 is redundant categorical feature, skipping it. How is the redundancy of this feature determined?
Furthermore, I'm a bit lost as to what all the info for the iterations stands for. Here is the info output for one single iteration of my training:
{Feature1} pr0 tb1 type0, border=10 score 2.001737609
Feature2, bin=40 score 2.867480488
{Feature3, Feature2 b40} pr2 tb2 type0, border=6 score 3.533462883
Feature4, bin=5 score 4.105045044
46: learn: -1.2759319 total: 13.2s remaining: 843ms
In this case, Feature1 and Feature3 are categorical, while Feature2 and Feature4 are numerical.
What are all the values like pr0, tb1, type0, score etc. stand for? Any pointer to a documentation will be very appreciated.
I am using the community-contributed command esttab with the rename() option.
I have a special situation in which I run multiple regressions where each regression has a coefficient that is from a different (similarly-named) variable, but each corresponds to the same idea.
Here is a (very contrived) toy example:
sysuse auto, clear
rename weight mpg1
rename mpg mpg2
rename turn mpg3
I want to display the results of three regressions, but have only one line for mpg1, mpg2, and mpg3 (instead of each one appearing on a separate line).
One way to accomplish this is to do the following:
eststo clear
eststo: quietly reg price mpg1
eststo: quietly reg price mpg2
eststo: quietly reg price mpg3
esttab, rename(mpg1 mpg mpg2 mpg mpg3 mpg)
Can I rename all of the variables at the same time by doing something such as rename(mpg* mpg)?
If I want to run a large number of regressions, it becomes more advantageous to do this instead of writing them all out by hand.
Stata's rename group command can handle abbreviations and wildcards, unlike the rename() option of estout. However, for the latter, you need to build a list of names and store it in a local macro.
Below you can find an improved version of your toy example code:
sysuse auto, clear
eststo clear
rename (weight mpg turn) mpg#, addnumber
forvalues i = 1 / 3 {
eststo: quietly reg price mpg`i'
local mpglist `mpglist' mpg`i' mpg
}
esttab, rename(`mpglist')
------------------------------------------------------------
(1) (2) (3)
price price price
------------------------------------------------------------
mpg 2.044*** -238.9*** 207.6**
(5.42) (-4.50) (2.76)
_cons -6.707 11253.1*** -2065.0
(-0.01) (9.61) (-0.69)
------------------------------------------------------------
N 74 74 74
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
I know that one can get predicted values as follows:
reg y x1 x2 x3
predict pred_values
Let's say that I run a regression and store the values:
reg y x1 x2
matrix stored_b = e(b)
And then I run another regression (doesn't matter what).
Is it possible to use the predict command using stored_b instead of the current e(b)?
(Of course, I could generate the predicted values by manually computing them based on stored_b, but this could get tedious if there are many coefficients.)
There's no need to create a matrix. Stata has commands that facilitate the task. Try estimates store and estimates restore. An example:
clear
set more off
sysuse auto
// initial regression/predictions
regress price weight
estimates store myest
predict double resid, residuals
// second regression/prediction
regress price mpg
predict double residdiff, residuals
// backup and predict from initial regression results
estimates restore myest
predict double resid2, residuals
// should pass
assert resid == resid2
// should fail
assert resid == residdiff
I am preparing for a units quiz and there are two kinds of conversions that have me stumped.
Type one:
What is length (in ns) of one cycle on a XXX computer?
- In this case, XXX can be some MHz or Ghz, randomly. I am having trouble converting the cyles times. Example:
What is length (in ns) of one cycle on a 50 MegaHertz (MHz) computer?
The second type of conversion I have trouble with:
If the average instruction on a XXX computer requires ZZ cycles, how long (in ns) does the average instruction take to execute?
- Like the previous case, the XXX will either be some MHz or Ghz. For example:
If the average instruction on a 2.0 GigaHertz (GHz) computer requires 2.0 cycles, how long (in ns) does the average instruction take to execute?
I don't understand what I am doing wrong in these conversions but I keep getting them wrong. Any help would be great!
I hope to have my math correct, I'll give it a try.
One Hertz is defined as one cycle per second, so a 1 Hz computer has a 10^9 ns cycle length (because nano is 10^-9).
50 Mega = 50 * 10^6, so 50MHz yields a (10^9 ns / (50 * 10^6)) = 20 ns cycle length.
2 Giga = 2 * 10^9, so 2GHz yields a (10^9 ns / (2 * 10^9)) = 0.5 ns cycle length. Two cycles here take 1 ns.
The unit for frequency is Hz which is the same as 1/s or s^-1. To convert from frequency to length (really time) you have to compute the reciprocal value: length = 1/frequency.
What is length (in ns) of one cycle on a 50 MegaHertz (MHz) computer?
1/(50*10^6 Hz) = 2*10^-8 s = 20*10^-9 s = 20 ns
If the average instruction on a 2.0 GigaHertz (GHz) computer requires 2.0 cycles, how long (in ns) does the average instruction take to execute?
One cycle: 1/(2*10^9 Hz) = 0.5*10^-9 s = 0.5 ns
Two cycles: 1 ns