Caffe: change data layer after surgery - caffe

I trained an FC network with HDF5 data layer, then used surgery for transplantation to a convolutional network, then changed the data layer to a probe-suitable data layer, i.e.:
from:
layer {
name: "layer_data_left"
type: "HDF5Data"
top: "data_left"
top: "labels_left"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/home/me/Desktop/trainLeftPatches.txt"
batch_size: 128
}
}
to
layer {
name: "data_left"
type: "Input"
top: "data_right"
input_param { shape: { dim: 1 dim: 1 dim: 1241 dim: 367 } }
}
is there any reason this would go out of memory?:
>>> fc_net.forward()
F0729 20:02:02.205382 6821 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
Or, is it more likely that I made a mistake somewhere in surgery & exchanging data layers?
Thank you.

Related

problems of training RCF model using caffe

I'm using caffe to train a model. I am sure I have connected the data layer with the train.txt in the source of image_data_param.But when I try ./train.sh. It always prompt: can not find image.
Ubuntu 18.04 openCV3 python2
layer {
name: "data"
type: "ImageLabelmapData"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: false
mean_value: 104.00699
mean_value: 116.66877
mean_value: 122.67892
}
image_data_param {
root_folder: "/home/yogang/Desktop/rawdata/train/"
source: "/home/yogang/Desktop/rawdata/train.txt"
batch_size: 1
shuffle: true
new_height:0
new_width: 0
}
}
I0828 19:29:13.834946 14079 layer_factory.hpp:77] Creating layer data
I0828 19:29:13.835011 14079 net.cpp:101] Creating Layer data
I0828 19:29:13.835031 14079 net.cpp:409] data -> data
I0828 19:29:13.835059 14079 net.cpp:409] data -> label
I0828 19:29:13.835124 14079 image_labelmap_data_layer.cpp:42] Opening file /home/yogang/Desktop/rawdata/train.txt
I0828 19:29:13.835505 14079 image_labelmap_data_layer.cpp:52] Shuffling data
I0828 19:29:13.835677 14079 image_labelmap_data_layer.cpp:57] A total of 242 images.
E0828 19:29:13.836748 14079 io.cpp:80] Could not open or find file /home/yogang/Desktop/rawdata/train//home/yogang/Desktop/rawdata/train/satellite144.jpg
E0828 19:29:13.836797 14079 io.cpp:80] Could not open or find file /home/yogang/Desktop/rawdata/train//home/yogang/Desktop/rawdata/train/400.jpg
F0828 19:29:13.836818 14079 image_labelmap_data_layer.cpp:86] Check failed: cv_img.data Could not load /home/yogang/Desktop/rawdata/train/satellite144.jpg
*** Check failure stack trace: ***
./train.sh: line 8: 14079 Aborted (core dumped) ./solve.py
i think image path is wrong
check your image path

Unknown blob input data to layer 0 in caffe

I am getting following error while using my caffe prototxt:
F0329 17:37:40.771555 24587 insert_splits.cpp:35] Unknown blob input data to layer 0
*** Check failure stack trace: ***
The first 2 layers in my caffe prototxt is given below:
layers {
name: "data"
type: IMAGE_DATA
top: "data"
top: "label"
include {
phase: TRAIN
}
image_data_param {
source: "train2.txt"
batch_size: 100
new_height: 28
new_width: 28
is_color: false
}
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 3
convolution_param {
num_output: 8
kernel_size: 9
stride: 1
weight_filler { type: "xavier" }
bias_filler { type: "constant" }
}
}
What could be the possible reason for the same ?
t seems like your IMAGE_DATA layer is only defined for TRAIN phase. Thus blobs data and label are not defined for TEST phase. I suspect you see no error when the solver builds the train phase net, and only when test phase net is built then the error appears.

Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM for FASTER RCNN Library

I am testing Faster Rcnn. Installation is fine.
During the installation, I had one issue with cudnn5.1 and I followed the suggestion here and now installation is fine.
Now I test the demo code as
./tools/demo.py
Then I have error as
I1117 09:48:41.011925 12503 net.cpp:51] Initializing net from parameters:
name: "VGG_ILSVRC_16_layers"
state {
phase: TEST
level: 0
}
.
.
.
layer {
name: "cls_prob"
type: "Softmax"
bottom: "cls_score"
top: "cls_prob"
}
I1117 09:48:41.012234 12503 layer_factory.hpp:77] Creating layer input
I1117 09:48:41.012251 12503 net.cpp:84] Creating Layer input
I1117 09:48:41.012259 12503 net.cpp:380] input -> data
I1117 09:48:41.012271 12503 net.cpp:380] input -> im_info
I1117 09:48:41.328574 12503 net.cpp:122] Setting up input
I1117 09:48:41.328608 12503 net.cpp:129] Top shape: 1 3 224 224 (150528)
I1117 09:48:41.328614 12503 net.cpp:129] Top shape: 1 3 (3)
I1117 09:48:41.328618 12503 net.cpp:137] Memory required for data: 602124
I1117 09:48:41.328624 12503 layer_factory.hpp:77] Creating layer conv1_1
I1117 09:48:41.328655 12503 net.cpp:84] Creating Layer conv1_1
I1117 09:48:41.328660 12503 net.cpp:406] conv1_1 <- data
I1117 09:48:41.328670 12503 net.cpp:380] conv1_1 -> conv1_1
F1117 09:48:41.676553 12503 cudnn.hpp:128] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
*** Check failure stack trace: ***
Aborted (core dumped)
What is wrong with my installation for this faster rcnn?
I have cuda8.0 and libcudnn5_5.1.10-1+cuda8.0 is installed on Ubuntu16.04.
I have Qurdo K4200 graphic card.
Now it works for me. Since libcudnn5_5.1 is for CUDA7.5. Can check in cudnn's user guide at GPU and driver requirements. So I changed to cudnnv6.0 for CUDA8.0.
Then you may face the issue of
Check failed: error == cudaSuccess (8 vs. 0) invalid device function
For that you will need to py-faster-rcnn/lib/fast_rcnn/config.py and change
__C.USE_GPU_NMS = True
to
__C.USE_GPU_NMS = False
Then now it works.

Cannot find mean file while training on Imagenet

I am trying to train and validate a network on Imagenet. The validation process works without any problems (with the pretrained weights). However, when I try to perform the training, there appears an error that the imagenet_mean.binaryproto file is not found; the very same file that has worked for the valiudation process. What is wrong?
...
I0222 15:29:15.108032 15823 net.cpp:399] data -> label
I0222 15:29:15.108057 15823 data_transformer.cpp:25] Loading mean file from: /home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto
F0222 15:29:15.108577 15830 db_lmdb.hpp:14] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***
# 0x7fc82857edaa (unknown)
# 0x7fc82857ece4 (unknown)
# 0x7fc82857e6e6 (unknown)
# 0x7fc828581687 (unknown)
# 0x7fc828ba115e caffe::db::LMDB::Open()
# 0x7fc828b75644 caffe::DataReader::Body::InternalThreadEntry()
# 0x7fc828cc1470 caffe::InternalThread::entry()
# 0x7fc81f4a8a4a (unknown)
# 0x7fc826a98184 start_thread
# 0x7fc8271b437d (unknown)
# (nil) (unknown)
Aborted (core dumped)
Here is the prototxt I am using:
name: "CaffeNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "/home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto"
#mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: true
# }
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 227
mean_file: "/home/myuser/learning/caffe/data/ilsvrc12/imagenet_mean.binaryproto"
#mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
# crop_size: 227
# mean_value: 104
# mean_value: 117
# mean_value: 123
# mirror: false
# }
data_param {
source: "/sdc/repository/myuser/Imagenet2012/Imagenet2012trainLMDB"
#source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1"
…

How do I specify num_test_nets in restart?

I trained a GoogleNet model for a while, and now I'd like to restart from a checkpoint, adding a test phase. I have the test already in my train_val.prototxt file, and I added the proper parameters to my solver.prototxt ... but I get an error on the restart:
I0712 15:53:02.615947 47646 net.cpp:278] This network produces output loss2/loss1
I0712 15:53:02.615964 47646 net.cpp:278] This network produces output loss3/loss3
I0712 15:53:02.616109 47646 net.cpp:292] Network initialization done.
F0712 15:53:02.616665 47646 solver.cpp:128] Check failed: param_.test_iter_size() == num_test_nets (1 vs. 0) test_iter must be specified for each test network.
*** Check failure stack trace: ***
# 0x7f550cf70e6d (unknown)
# 0x7f550cf72ced (unknown)
# 0x7f550cf70a5c (unknown)
# 0x7f550cf7363e (unknown)
# 0x7f550d3b605b caffe::Solver<>::InitTestNets()
# 0x7f550d3b63ed caffe::Solver<>::Init()
# 0x7f550d3b6738 caffe::Solver<>::Solver()
# 0x7f550d4fa633 caffe::Creator_SGDSolver<>()
# 0x7f550da5bb76 caffe::SolverRegistry<>::CreateSolver()
# 0x7f550da548f4 train()
# 0x7f550da52316 main
# 0x7f5508f43b15 __libc_start_main
# 0x7f550da52d3d (unknown)
solver.prototxt
train_net: "<my_path>/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
lr_policy: "step"
stepsize: 320000
gamma: 0.96
max_iter: 10000000
momentum: 0.9
weight_decay: 0.0002
snapshot: 40000
snapshot_prefix: "models/<my_path>"
solver_mode: CPU
train_val.prototxt train and test layers:
name: "GoogleNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/<blah>/ilsvrc12_train_lmdb"
batch_size: 32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/<blah>/ilsvrc12_val_lmdb"
batch_size: 32
backend: LMDB
}
}
You should modify one place in your solver.prototxt from
train_net: "/train_val.prototxt"
to
net: "/train_val.prototxt"
Because the Solver does not use value of "train_net" to initialize a test net, so the test phase you added was not founded by the solver.
In fact, the parameters "train_net" and "test_net" are separately used to initialize a train net and a test net only, while "net" is used for both.