I am a new to caffe and I am experiencing a weird thing when I train my model. For tge same solver prototype, if I train without pre-train model, its test accuracy can increase to 0.9+, however if I finetune with a pre-train model, its accuracy fluctuate between 0.5-0.6. How can I solve the problem, making finetune model test accuracy increase up to 0.9?
Following is my command and configure file:
1 train:
./build/tools/caffe train --solver=solver_cmp.prototxt -gpu 0
2 finetune:
./build/tools/caffe train --solver=solver_cmp.prototxt --weights=pruned_sqznet.caffemodel -gpu 0
3 solver_cmp.prototxt
net: "pruned_sqznet.prototxt"
test_iter: 80
test_interval: 100
base_lr: 0.01
type: "Nesterov"
display: 40
max_iter: 100000
iter_size: 16 #global batch size = batch_size * iter_size
gamma: 0.0001
lr_policy: "poly"
power: 1.0 #linearly decrease LR
momentum: 0.9
weight_decay: 0.005
snapshot: 10000
snapshot_prefix: "examples/crowd/models/cmp_fine"
random_seed: 42
solver_mode: GPU
average_loss: 40
Related
Problem description:
I want to train a GRU by using a small dataset, which has 40 training samples. Length of input sequence is 3 and length of output sequence is 1. Below is the configuraions:
hidden units: 128
input size(input feature num): 11
layer: 2
drupout: 0.5
bidirectional: False
lr: 0.0001
optimizer: Adam
batch size: 8
Below is my model. Thank Azhar Khan for helping me format the code.
class GRU(nn.Module):
def __init__(self, GRU_input_num, GRU_hidden_num, GRU_layer_num, dropout, bidirectional, seed):
super(GRU, self).__init__()
torch.manual_seed(seed)
self.GRU_input_num = GRU_input_num
self.GRU_hidden_num = GRU_hidden_num
self.GRU_layer_num = GRU_layer_num
self.dropout = dropout
self.bidirectional = bidirectional
self.direction = 2 if self.bidirectional else 1
if GRU_layer_num > 1:
self.gru = nn.GRU(GRU_input_num, GRU_hidden_num, GRU_layer_num, dropout=dropout, batch_first=True,
bidirectional=bidirectional)
else:
self.gru = nn.GRU(GRU_input_num, GRU_hidden_num, GRU_layer_num, batch_first=True,
bidirectional=bidirectional)
self.predict = nn.Linear(in_features=self.GRU_hidden_num * self.direction * self.GRU_layer_num,
out_features=constants.CATEGORY_NUM * constants.BACTERIA_PER_CATEGORY)
def forward(self, x):
"""GRU forward. Input shape: [batch, input_step, features]"""
# shape of hidden state: [layer * direction, batch, hidden]
batch = x.shape[0]
_, hidden_state = self.gru(x)
hidden_state = hidden_state.permute(1, 0, 2) # [batch, layer * direction, hidden]
hidden_state = hidden_state.contiguous().view(batch, 1, self.GRU_hidden_num * self.direction * self.GRU_layer_num).squeeze(1) # [batch, layer * direction * hidden_num]
return self.predict(hidden_state), hidden_state.cpu().detach().numpy() # [batch, features]
I am curious about how will the hidden state change during training. Here are my observations.
My observations:
Heatmap of hidden state.
I printed the heatmap of hidden states every 25 epochs. x axis is sample index in one batch, y axis is hidden state dimension and the title is the sum of all hidden states. The figures are showed below
after 25 epochs
after 75 epochs
after 150 epochs
after 300 epochs
after 550 epochs
The summation keeps shrink along with training.
Weights of GRU.
I also have a glance to the weights of GRU, below is a snapshot of part of weights.
snapshot of GRU weights
Almost all the weights are end with 'e-40(or 41 etc.)', which mean every single weight close to zero.
My question:
Is this a normal phenomenon? If not, what could be the cause of this issue?
Thanks for anyone who can give me some comments about this, and do let me know what else information you need.
I'm trying to implement a DQN model of Pong game. However, it still performs like random activities even after about 1000 episodes. The CNN training seems not improve the agents.
Here is my main code:
I create a CNN including three convolution layers after pooling and three forward connection layers. the input channels is the number of pre-processed frame (from 3210160 to 48484 and the channel is 4):
class CNN(nn.Module):
def __init__(self, s_channels, a_space):
super(CNN, self).__init__()
self.pool = nn.MaxPool2d(kernel_size=2, stride=1)
self.conv1 = nn.Conv2d(s_channels,out_channels=32,kernel_size=8,stride=4)
self.conv2 = nn.Conv2d(32,64,4,2)
self.conv3 = nn.Conv2d(64,64,3,1)
self.fc1 = nn.Linear(64*4*4,1024)
self.fc2 = nn.Linear(1024,512)
self.fc3 = nn.Linear(512,a_space)
def forward(self,input):
output = self.pool(F.relu(self.conv1(input)))
output = self.pool(F.relu(self.conv2(output)))
output = self.pool(F.relu(self.conv3(output)))
output = output.view(-1,64*4*4)
output = F.relu(self.fc1(output))
output = F.relu(self.fc2(output))
output = F.relu(self.fc3(output))
return output
After that, I construct an agent class with action selection and CNN training functions. In CNN training function, I use the batch input to come out the loss value instead of step-by-step for-loop iteration of batch data. before coming out the loss and backward, I transform the input image data into vectors of batch size. Here is the agent class:
class Agent():
def __init__(self, s_space, a_space, device) -> None:
# set GPU device to cuda
self.device = device
# define parameters
self.epsilon = 1.0
self.min_epsilon = 0.01
self.dr = 0.995
self.lr = 0.001
self.gamma = 0.9
# define models
self.evl_net = CNN(s_space, a_space).to(self.device)
self.tgt_net = CNN(s_space, a_space).to(self.device)
self.cert = nn.SmoothL1Loss()
self.optimal = th.optim.Adam(self.evl_net.parameters(),lr=self.lr)
# define memory store
self.memory = deque(maxlen=2000)
# pre-process the input image data
def data_pre_process(self,batch_size):
s_v = []
a_v = []
next_s_v = []
r_v = []
dones = []
materials = random.sample(self.memory,batch_size)
for t in materials:
s_v.append(t[0])
a_v.append(t[1])
next_s_v.append(t[2])
r_v.append(t[3])
dones.append(t[4])
s_v = th.Tensor(s_v).to(self.device)
a_v = th.LongTensor(a_v).unsqueeze(1).to(self.device)
r_v = th.FloatTensor(r_v).to(device)
# print(r_v.shape)
return s_v, a_v, next_s_v, r_v, dones
# record the transformed images
def record(self,tpl):
self.memory.append(tpl)
# select actions according to the states (input images with 4 channels)
def select(self,state,a_space):
actions = self.evl_net(state).data.tolist()
if(random.random() <= self.epsilon):
action = random.randint(0,a_space-1)
else:
action = actions.index(max(actions))
return action
# save CNN model
def save(self):
th.save(self.evl_net.state_dict(), "./Pong.pth")
# at the beginning load the saved CNN model
def load(self,s_channels, a_space):
self.evl_net = CNN(s_channels, a_space).to(self.device)
self.evl_net.load_state_dict(th.load("./Pong.pth"))
# DQN replay progression
def train(self,state,batch_size):
"""
s_v_size: [batch_size,4,84,84] type: Tensor
s_a_size: [batch_size,1] type: Tensor
next_s_v_size: [batch_size,4,84,84] type: List
r_v_size: [1,batch_size] type: Tensor
dones_size: [batch_size] type: List
"""
s_v,a_v,next_s_v,r_v,dones = self.data_pre_process(batch_size)
self.tgt_net.load_state_dict(self.evl_net.state_dict())
# create evl_Q_value tensor
evl_Q_value = self.evl_net(s_v).gather(0,a_v) # size: [batch_size,6].gather() -> [batch_size,1] Type: Tensor
# correctly transform next_s_v into tensor:
nonDone_index = th.LongTensor(tuple([i for i,x in enumerate(dones) if x!=True])).to(self.device)
tgt_Q_value = th.zeros(batch_size).to(device)
true_next_s_v = list(filter((None).__ne__,next_s_v)) # pop the "None" elements
true_next_s_v = th.FloatTensor(true_next_s_v).to(self.device) # size: [notDone_batch_size,4,84,84]
# print(true_next_s_v.shape)
tgt = self.tgt_net(true_next_s_v).max(1)[0].detach() # size [1,notDone_batch_size] Type: Tensor
# print(tgt.shape)
# update tgt_Q_value
tgt_Q_value[nonDone_index] = tgt
tgt_Q_value = r_v + self.gamma * tgt_Q_value
tgt_Q_value = tgt_Q_value.reshape(batch_size,1) # size: [batch_size, 1] cannot be back propagated
# print(tgt_Q_value)
self.optimal.zero_grad()
loss = self.cert(evl_Q_value, tgt_Q_value)
loss.backward()
# constrain the gradient from explosion
for p in self.evl_net.parameters():
p.grad.data.clamp_(-1, 1)
self.optimal.step()
# decrease fire
if(self.epsilon > self.min_epsilon):
self.epsilon *= self.dr
In the main training progress, I set the batch size increasing from 32 to 64 for accelerating the operation. The CNN will be updated each four episodes. The statistic information will be printed each ten episodes.
# set GPU device to cuda
device = th.device("cuda:0" if th.cuda.is_available() else "cpu")
# set episode step and batch_size
episodes = 5000
batch_size = 32
env = gym.make("PongNoFrameskip-v4")
env = gym.wrappers.AtariPreprocessing(env, noop_max=30, frame_skip=4, screen_size=84, terminal_on_life_loss=True, grayscale_obs=True, grayscale_newaxis=False, scale_obs=False)
# create frame stack for the input image data (size: (4,84,84))
env = gym.wrappers.FrameStack(env, 4)
channels = env.observation_space.shape[0]
a_space = env.action_space.n
agent = Agent(channels, a_space, device)
agent.load(channels, a_space)
# testing start:
for e in range(episodes):
# step 1: reset the agent at the beginning
s = np.array(env.reset())
img = plt.imshow(env.render('rgb_array'))
done = False
score = 0
while not done:
# step 2: iterate actions
a = agent.select(th.Tensor(s).unsqueeze(0).to(device),a_space)
next_s, reward, done, _ = env.step(a)
if(done==True):
reward = -1.0
next_s = None
else:
next_s = np.array(next_s)
# print(next_s.shape)
# step 3: record the data into buffer
dataset = (s,a,next_s,reward,done)
agent.record(dataset)
# step 4: update state steps
s = next_s
score += reward
# step 5: training and update CNN by each 4 episodes
if(len(agent.memory) > batch_size and e % 4 == 0):
agent.train(channels,batch_size)
agent.save()
# appendix 1: at the beginning increase batch_size from 32 to 64
if(batch_size < 64):
batch_size += 1
# appendix 2: return score by each 10 episodes
if(e % 10 == 0 and len(agent.memory)>batch_size):
print("episodes:",e,"score:",score,"epsilon: {:.2}".format(agent.epsilon))
During running there is not any error information reminded. However, the agent does not perform as well as expected. After 1000 episodes, it still returns minus score as it did at the very start. The output is like this:
episodes: 800 score: -20.0 epsilon: 0.37
episodes: 810 score: -21.0 epsilon: 0.36
episodes: 820 score: -21.0 epsilon: 0.36
episodes: 830 score: -21.0 epsilon: 0.35
episodes: 840 score: -21.0 epsilon: 0.35
episodes: 850 score: -21.0 epsilon: 0.34
episodes: 860 score: -21.0 epsilon: 0.34
episodes: 870 score: -21.0 epsilon: 0.34
episodes: 880 score: -20.0 epsilon: 0.33
episodes: 890 score: -21.0 epsilon: 0.33
episodes: 900 score: -20.0 epsilon: 0.32
episodes: 910 score: -21.0 epsilon: 0.32
episodes: 920 score: -21.0 epsilon: 0.31
episodes: 930 score: -21.0 epsilon: 0.31
episodes: 940 score: -21.0 epsilon: 0.31
episodes: 950 score: -21.0 epsilon: 0.3
episodes: 960 score: -21.0 epsilon: 0.3
episodes: 970 score: -21.0 epsilon: 0.3
episodes: 980 score: -21.0 epsilon: 0.29
I rechecked the structure of the model according to the algorithm theory but found nothing different. I hope to get some advice and help on how to deal with this problem.
this is the train and development cell for multi-label classification task using Roberta (BERT). the first part is training and second part is development (validation). train_dataloader is my train dataset and dev_dataloader is development dataset. my question is: why train loss is decreasing step by step, but accuracy doesn't increase so much? practically, accuracy is increasing until iterate 4, but train loss is decreasing until the last epoch (iterate). is this ok or there should be a problem?
train_loss_set = []
iterate = 4
for _ in trange(iterate, desc="Iterate"):
model.train()
train_loss = 0
nu_train_examples, nu_train_steps = 0, 0
for step, batch in enumerate(train_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
optimizer.zero_grad()
output = model(batch_input_ids, attention_mask=batch_input_mask)
logits = output[0]
loss_function = BCEWithLogitsLoss()
loss = loss_function(logits.view(-1,num_labels),batch_labels.type_as(logits).view(-1,num_labels))
train_loss_set.append(loss.item())
loss.backward()
optimizer.step()
train_loss += loss.item()
nu_train_examples += batch_input_ids.size(0)
nu_train_steps += 1
print("Train loss: {}".format(train_loss/nu_train_steps))
###############################################################################
model.eval()
logits_pred,true_labels,pred_labels,tokenized_texts = [],[],[],[]
# Predict
for i, batch in enumerate(dev_dataloader):
batch = tuple(t.to(device) for t in batch)
batch_input_ids, batch_input_mask, batch_labels = batch
with torch.no_grad():
out = model(batch_input_ids, attention_mask=batch_input_mask)
batch_logit_pred = out[0]
pred_label = torch.sigmoid(batch_logit_pred)
batch_logit_pred = batch_logit_pred.detach().cpu().numpy()
pred_label = pred_label.to('cpu').numpy()
batch_labels = batch_labels.to('cpu').numpy()
tokenized_texts.append(batch_input_ids)
logits_pred.append(batch_logit_pred)
true_labels.append(batch_labels)
pred_labels.append(pred_label)
pred_labels = [item for sublist in pred_labels for item in sublist]
true_labels = [item for sublist in true_labels for item in sublist]
threshold = 0.4
pred_bools = [pl>threshold for pl in pred_labels]
true_bools = [tl==1 for tl in true_labels]
print("Accuracy is: ", jaccard_score(true_bools,pred_bools,average='samples'))
torch.save(model.state_dict(), 'bert_model')
and the outputs:
Iterate: 0%| | 0/10 [00:00<?, ?it/s]
Train loss: 0.4024542534684801
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Jaccard is ill-defined and being set to 0.0 in samples with no true or predicted labels. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Accuracy is: 0.5806403013182674
Iterate: 10%|█ | 1/10 [03:21<30:14, 201.64s/it]
Train loss: 0.2972540049911379
Accuracy is: 0.6091337099811676
Iterate: 20%|██ | 2/10 [06:49<27:07, 203.49s/it]
Train loss: 0.26178574864264137
Accuracy is: 0.608361581920904
Iterate: 30%|███ | 3/10 [10:17<23:53, 204.78s/it]
Train loss: 0.23612180122962365
Accuracy is: 0.6096717783158462
Iterate: 40%|████ | 4/10 [13:44<20:33, 205.66s/it]
Train loss: 0.21416303515434265
Accuracy is: 0.6046892655367231
Iterate: 50%|█████ | 5/10 [17:12<17:11, 206.27s/it]
Train loss: 0.1929110718982203
Accuracy is: 0.6030885122410546
Iterate: 60%|██████ | 6/10 [20:40<13:46, 206.74s/it]
Train loss: 0.17280191068465894
Accuracy is: 0.6003766478342749
Iterate: 70%|███████ | 7/10 [24:08<10:21, 207.04s/it]
Train loss: 0.1517329115446631
Accuracy is: 0.5864783427495291
Iterate: 80%|████████ | 8/10 [27:35<06:54, 207.23s/it]
Train loss: 0.12957811209705325
Accuracy is: 0.5818832391713747
Iterate: 90%|█████████ | 9/10 [31:03<03:27, 207.39s/it]
Train loss: 0.11256680189521162
Accuracy is: 0.5796045197740114
Iterate: 100%|██████████| 10/10 [34:31<00:00, 207.14s/it]
The training loss is decreasing because you model gradually learns your training set. The evaluation accuracy is how well the model learned the global features of your training set and how well your model predicts "unseen data". So, if the loss is decreasing, your model is learning. Perhaps it has learned too specific information from the training set and it is, in fact, overfitting. This means that it fits "too well" to the training data and is unable to make correct predictions on unseen data, due to the fact that the test data may be a little different. That is why the evaluation accuracy is not increasing any more.
This could be an explanation.
I have a lasagane code. I want to create the same network using caffe. I could convert the network. But i need help with the hyperparameters in lasagne. The hyperparameters in lasagne look like:
lr = 1e-2
weight_decay = 1e-5
prediction = lasagne.layers.get_output(net['out'])
loss = T.mean(lasagne.objectives.squared_error(prediction, target_var))
weightsl2 = lasagne.regularization.regularize_network_params(net['out'], lasagne.regularization.l2)
loss += weight_decay * weightsl2
How do i perform the L2 regularization part in caffe? Do I have to add any layer for regularization after each convolution/inner-product layer? Relevant parts from my solver.prototxt is as below:
base_lr: 0.01
lr_policy: "fixed"
weight_decay: 0.00001
regularization_type: "L2"
stepsize: 300
gamma: 0.1
max_iter: 2000
momentum: 0.9
also posted in http://datascience.stackexchange.com. Waiting for answers.
It seems like you already got it right.
The weight_decay meta-parameter combined with regularization_type: "L2" in your 'solver.prototxt' tell caffe to use L2 regularization with weight_decay = 1e-5.
One more thing you might want to tweak is how much regularization affect each parameter. You can set this for each parameter blob in the net via
param { decay_mult: 1 }
For example, an "InnerProduct" layer with bias has two parameters:
layer {
type: "InnerProduct"
name: "fc1"
# bottom and top here
inner_product_param {
bias_term: true
# ... other params
}
param { decay_mult: 1 } # for weights use regularization
param { decay_mult: 0 } # do not regularize the bias
}
By default, decay_mult is set to 1, that is, all weights of the net are regularized the same. You can change that to regularize more/less specific parameter blobs.
I am trying to implement a pixel-wise binary classification for images using caffe. For each image having dimension 3x256x256, I have a 256x256 label array in which each entry is marked as either 0 or 1. Also, when I read my HDF5 file using the below code,
dirname = "examples/hdf5_classification/data"
f = h5py.File(os.path.join(dirname, 'train.h5'), "r")
ks = f.keys()
data = np.array(f[ks[0]])
label = np.array(f[ks[1]])
print "Data dimension from HDF5", np.shape(data)
print "Label dimension from HDF5", np.shape(label)
I get the data and label dimension as
Data dimension from HDF5 (402, 3, 256, 256)
Label dimension from HDF5 (402, 256, 256)
I am trying to feed this data into the given hdf5 classification network and while training, I have the following output(using the default solver, but in GPU mode).
!cd /home/unni/MTPMain/caffe-master/ && ./build/tools/caffe train -solver examples/hdf5_classification/solver.prototxt
gives
I1119 01:29:02.222512 11910 caffe.cpp:184] Using GPUs 0
I1119 01:29:02.509752 11910 solver.cpp:47] Initializing solver from parameters:
train_net: "examples/hdf5_classification/train_val.prototxt"
test_net: "examples/hdf5_classification/train_val.prototxt"
test_iter: 250
test_interval: 1000
base_lr: 0.01
display: 1000
max_iter: 10000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 5000
snapshot: 10000
snapshot_prefix: "examples/hdf5_classification/data/train"
solver_mode: GPU
device_id: 0
I1119 01:29:02.519805 11910 solver.cpp:80] Creating training net from train_net file: examples/hdf5_classification/train_val.prototxt
I1119 01:29:02.520031 11910 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I1119 01:29:02.520053 11910 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I1119 01:29:02.520104 11910 net.cpp:49] Initializing net from parameters:
name: "LogisticRegressionNet"
state {
phase: TRAIN
}
layer {
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
hdf5_data_param {
source: "examples/hdf5_classification/data/train.txt"
batch_size: 10
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "data"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc1"
bottom: "label"
top: "loss"
}
I1119 01:29:02.520256 11910 layer_factory.hpp:76] Creating layer data
I1119 01:29:02.520277 11910 net.cpp:106] Creating Layer data
I1119 01:29:02.520290 11910 net.cpp:411] data -> data
I1119 01:29:02.520331 11910 net.cpp:411] data -> label
I1119 01:29:02.520352 11910 hdf5_data_layer.cpp:80] Loading list of HDF5 filenames from: examples/hdf5_classification/data/train.txt
I1119 01:29:02.529341 11910 hdf5_data_layer.cpp:94] Number of HDF5 files: 1
I1119 01:29:02.542645 11910 hdf5.cpp:32] Datatype class: H5T_FLOAT
I1119 01:29:10.601307 11910 net.cpp:150] Setting up data
I1119 01:29:10.612926 11910 net.cpp:157] Top shape: 10 3 256 256 (1966080)
I1119 01:29:10.612963 11910 net.cpp:157] Top shape: 10 256 256 (655360)
I1119 01:29:10.612969 11910 net.cpp:165] Memory required for data: 10485760
I1119 01:29:10.612983 11910 layer_factory.hpp:76] Creating layer fc1
I1119 01:29:10.624948 11910 net.cpp:106] Creating Layer fc1
I1119 01:29:10.625015 11910 net.cpp:454] fc1 <- data
I1119 01:29:10.625039 11910 net.cpp:411] fc1 -> fc1
I1119 01:29:10.645814 11910 net.cpp:150] Setting up fc1
I1119 01:29:10.645864 11910 net.cpp:157] Top shape: 10 2 (20)
I1119 01:29:10.645875 11910 net.cpp:165] Memory required for data: 10485840
I1119 01:29:10.645912 11910 layer_factory.hpp:76] Creating layer loss
I1119 01:29:10.657094 11910 net.cpp:106] Creating Layer loss
I1119 01:29:10.657133 11910 net.cpp:454] loss <- fc1
I1119 01:29:10.657147 11910 net.cpp:454] loss <- label
I1119 01:29:10.657163 11910 net.cpp:411] loss -> loss
I1119 01:29:10.657189 11910 layer_factory.hpp:76] Creating layer loss
F1119 01:29:14.883095 11910 softmax_loss_layer.cpp:42] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (10 vs. 655360) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
*** Check failure stack trace: ***
# 0x7f0652e1adaa (unknown)
# 0x7f0652e1ace4 (unknown)
# 0x7f0652e1a6e6 (unknown)
# 0x7f0652e1d687 (unknown)
# 0x7f0653494219 caffe::SoftmaxWithLossLayer<>::Reshape()
# 0x7f065353f50f caffe::Net<>::Init()
# 0x7f0653541f05 caffe::Net<>::Net()
# 0x7f06535776cf caffe::Solver<>::InitTrainNet()
# 0x7f0653577beb caffe::Solver<>::Init()
# 0x7f0653578007 caffe::Solver<>::Solver()
# 0x7f06535278b3 caffe::Creator_SGDSolver<>()
# 0x410831 caffe::SolverRegistry<>::CreateSolver()
# 0x40a16b train()
# 0x406908 main
# 0x7f065232cec5 (unknown)
# 0x406e28 (unknown)
# (nil) (unknown)
Aborted
Basically the error is
softmax_loss_layer.cpp:42] Check failed:
outer_num_ * inner_num_ == bottom[1]->count() (10 vs. 655360)
Number of labels must match number of predictions;
e.g., if softmax axis == 1 and prediction shape is (N, C, H, W),
label count (number of labels) must be N*H*W,
with integer values in {0, 1, ..., C-1}.
I am not able to understand why the number of labels expected is just same as my batch size. How exactly should I tackle this problem ? Is this a problem with my labeling method ?
Your problem is that "SoftmaxWithLoss" layer tries to compare a prediction vector of 2 elements per input image to a label of size 256-by-256 per image.
This makes no sense.
Root cause of the error: I guess what you tired to do is to have a binary classifier applied to each pixel in the image. To that end you defined "fc1" as an "InnerProduct" layer with num_output=2. However, the way caffe sees this is that you have a single binary classifier applied to the entire image. Thus caffe gives you a single binary prediction to the entire image.
How to solve: when working on pixel-wise predictions you no longer need to use "InnerProduct" layers and you have a "fully convolutional net". If you replace "fc1" with a conv layer (for instance a kernel that examine the 5-by-5 environment of each pixel and makes a decision according to this patch):
layer {
name: "bin_class"
type: "Convolution"
bottom: "data"
top: "bin_class"
convolution_param {
num_output: 2 # binary class output
kernel_size: 5 # 5-by-5 patch for prediciton
pad: 2 # make sure spatial output size equals size of label
}
}
Now applying "SoftmaxWithLoss" to bottom: bin_class and bottom: label should work.