I wanted to run the [FCN code][1] for semantic segmentation. However, I am beginner in Caffe and I did not know from which point should I start running the code.
Is there any step by step guidance for running?
Since I could not get that much help here, I am posting the steps here. It might be helpful for those who are inexperienced (like me). It took long time for me to figure out how to run it and get the results. you may be able to run it successfully, but similar to my case, the results was blank image for long time and finally found out that how setting should be.
I could perform successfully FCN8s on my data and I did the following steps:
Divide the data into two sets (train, validation) and the labels as well for the corresponding images in both train and validation (4 folder altogether: train_img_lmdb, train_label_lmdb, val_img_lmdb and val_label_lmdb)
Convert your data (each of them separately) into LMDB format (if it is not RGB, convert it using cv2 function), you will have 4 lmdb folders including data.mdb and lock.mdb. the sample code is available here.
Download the .caffemodel from the url that authors have provided,
Change the path to the path of your lmdb files in the train_val.ptototxt file, you should have 4 data layer that source is the path to the train_img_lmdb, train_label_lmdb, val_img_lmdb and val_label_lmdb, similar to this link
Add a convolution layer after this line (here, I have five classes, then change the num_output based on the number of classes in ground truth images):
layer {
name: "score_5classes"
type: "Convolution"
bottom: "score"
top: "score_5classes"
convolution_param {
num_output: 5
pad: 0
kernel_size: 1
}
}
change loss layer as follows(just according what name you have in bottom layer):
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score_5classes"
bottom: "label"
top: "loss"
loss_param {
normalize: true
}
}
Run the model to start training in you have pycaffe and installed caffe environment.
caffe train -solver=/path/to/solver.prototxt -weights /path/to/pre-trained/model/fcn8s-heavy-pascal.caffemodel 2>&1 | tee /path/to/save/training/log/file/fcn8_exp1.log
I hope it is helpful. Thanks for #Shai's helps
Related
My network has multiple losses. I would like to test only a certain loss.
I have sliced softmax into softmax_1 and softmax_2, and only softmax_1 is used for this certain loss.
### slice softmax into softmax_1: Nx6xHxW, softmax_2: Nx1xHxW
layer {
name: "slice_conv1_1_D"
type: "Slice"
bottom: "softmax"
top: "softmax_1"
top: "softmax_2"
slice_param
{
slice_dim: 1
slice_point: 6
}
}
However, running the network with the slice layer above made the network produced tons of softmax_2 values, as they are not used by other layers.
Are there any ways that I can slice my data "softmax" and keep only "softmax_1" and discard "softmax_2" completely?
Thank you very much for your help.
Added:
I know I might set loss_weight for other losses to be 0. However, I don't want to consider that options due to the computation resource.
The SilenceLayer is exactly what you're looking for - it's a special "do nothing" layer that only takes inputs but does not produce any output, keeping your log clean:
layer {
name: "silence"
type: "Silence"
bottom: "softmax_2"
}
I have a prototxt like:
layer{
name:"l1"
bottom: "b1"
top: "t1"
include{
phase: TRAIN
}
}
layer{
name:"l1"
bottom: "b1"
top: "t2"
include{
phase: TEST
}
}
There are two layers with
same name
different blobs
different phase
What will be the weights used in test phase?
1.) weights learned in train phase(because the layers have same name)
2.) Random initial weights
Weights learned in train phase will be attempted to be used in test phase.
But errors will occur to stop the testing if any of the 2 conditions below isn't satisfied:
the numbers of the two layers's blobs are equal
the shapes(size at every dimension) of the two layers's blobs are consistent
In fact, the layer in testing net will always try to copy weights from the layer with the same name in trained net and check on the number and shape of the blob containing weights to make sure that it will use the proper weights.
Details can be found in "template
void Net::ShareTrainedLayersWith(const Net* other)" function that will be called by test net object to copy weights from the trained net at the begining of test.
I started with Caffe and the mnist example ran well.
I have the train and label data as data.mat. (I have 300 training data with 30 features and labels are (-1, +1) that have saved in data.mat).
However, I don't quite understand how I can use caffe to implement my own dataset?
Is there a step by step tutorial can teach me?
Many thanks!!!! Any advice would be appreciated!
I think the most straight forward way to transfer data from Matlab to caffe is via HDF5 file.
First, save your data in Matlab in an HDF5 file using hdf5write. I assume your training data is stored in a variable name X of size 300-by-30 and the labels are stored in y a 300-by-1 vector:
hdf5write('my_data.h5', '/X',
single( permute(reshape(X,[300, 30, 1, 1]),[4:-1:1]) ) );
hdf5write('my_data.h5', '/label',
single( permute(reshape(y,[300, 1, 1, 1]),[4:-1:1]) ),
'WriteMode', 'append' );
Note that the data is saved as a 4D array: the first dimension is the number of features, second one is the feature's dimension and the last two are 1 (representing no spatial dimensions). Also note that the names given to the data in the HDF5 are "X" and "label" - these names should be used as the "top" blobs of the input data layer.
Why permute? please see this answer for an explanation.
You also need to prepare a text file listing the names of all hdf5 files you are using (in your case, only my_data.h5). File /path/to/list/file.txt should have a single line
/path/to/my_data.h5
Now you can add an input data layer to your train_val.prototxt
layer {
type: "HDF5Data"
name: "data"
top: "X" # note: same name as in HDF5
top: "label" #
hdf5_data_param {
source: "/path/to/list/file.txt"
batch_size: 20
}
include { phase: TRAIN }
}
For more information regarding hdf5 input layer, you can see in this answer.
I'm trying to understand how data is interpreted in Caffe.
For that I've taken a look at the Minst Tutorial
Looking at the input data definition:
layers {
name: "mnist"
type: DATA
data_param {
source: "mnist_train_lmdb"
backend: LMDB
batch_size: 64
scale: 0.00390625
}
top: "data"
top: "label"
}
I've now looked at the mnist_train_lmdb and taken one of the entries (shown in hex):
0801101C181C229006
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
00000000000054B99F973C2400000000000000000000000000000000
000000000000DEFEFEFEFEF1C6C6C6C6C6C6C6C6AA34000000000000
00000000000043724872A3E3FEE1FEFEFEFAE5FEFE8C000000000000
000000000000000000000011420E4343433B15ECFE6A000000000000
00000000000000000000000000000000000053FDD112000000000000
000000000000000000000000000000000016E9FF5300000000000000
000000000000000000000000000000000081FEEE2C00000000000000
000000000000000000000000000000003BF9FE3E0000000000000000
0000000000000000000000000000000085FEBB050000000000000000
00000000000000000000000000000009CDF83A000000000000000000
0000000000000000000000000000007EFEB600000000000000000000
00000000000000000000000000004BFBF03900000000000000000000
0000000000000000000000000013DDFEA60000000000000000000000
00000000000000000000000003CBFEDB230000000000000000000000
00000000000000000000000026FEFE4D000000000000000000000000
00000000000000000000001FE0FE7301000000000000000000000000
000000000000000000000085FEFE3400000000000000000000000000
000000000000000000003DF2FEFE3400000000000000000000000000
0000000000000000000079FEFEDB2800000000000000000000000000
0000000000000000000079FECF120000000000000000000000000000
00000000000000000000000000000000000000000000000000000000
2807
(I've added the line breaks here to be able to see the '7' digit.)
Now my question is, where this format is described? Or put differently where is defined that the first 36 bytes are some sort of header and the last 8 bytes have some label correspondence?
How would I go about constructing my own data?
Neither Blob Tutorial nor Layers Definition give much away about required formats. My intention is not to use image data, but time series
Thanks!
I realized that protocol buffers must come into play here. So I tried to deserialize it against some of the types defined in caffe.proto.
Datum seems to be the perfect fit:
{Caffe.Datum}
Channels: 1
Data: {byte[784]}
Encoded: false
FloatData: Count = 0
Height: 28
Label: 7
Width: 28
So the answer is simply: It's a serialized representation of a 'Datum' typed instance as defined per caffe.proto
Btw. since english is not my native language I had to first realize that "Datum" is a singular form of "data"
When it comes to using your own data, it's structured as follows:
The conventional blob dimensions for data are number N x channel
K x height H x width W. Blob memory is row-major in layout so the last
/ rightmost dimension changes fastest. For example, the value at index
(n, k, h, w) is physically located at index ((n * K + k) * H + h) * W
+ w.
See Blobs, Layers, and Nets: anatomy of a Caffe model for reference
I can try to answer your second question. Since Caffe only takes data in a bunch of selected formats like lmdb, hdf5 etc., it is best to convert (or generate - in case of synthetic data) your data to these formats. Following links can help you in this. If you have trouble with import hdf5in Python, then you may refer to this page.
Creating an LMDB file in Python
Writing an HDF5 file in Python
HDF5 more examples
While calculating the dimensions of the SSD Object Detection pipeline, we found that for the layer named "pool3", with parameters:
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
the input dimensions are 75x75x256 (WxHxC)
and according to the formula: ( Wout = ( Win − kernel + 2*padding )/stride +1), the output dimension for Width comes out to be, (75-2)/2 = 37.5
However, the paper shows the output size at this point as 38, same is the output of the following code for this network
net.blobs['pool3'].shape
the answer seems simple that Caffe framework 'ceils' it but referring to this post and this one as well, it should be 'flooring' and answer should be 37
So can anyone suggest how Caffe treats these non-integral output sizes ?
There's something called padding. When the output feature map is not a whole number, the input feature map is padded with 0's. That's a standard procedure though it may not be explicitly mentioned.