Runtime error on WGan-gp algorithm when running on GPU - deep-learning

I am a newbie in pytorch and running the WGan-gp algorithm on google colab using GPU runtime. I encountered the error below. The algorithm works fine when at None runtime i.e cpu.
Error generated during training
0%| | 0/3 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-18-7e1d4849a60a> in <module>
19 # Calculate gradient penalty on real and fake images
20 # generated by generator
---> 21 gp = gradient_penalty(netCritic, real_image, fake, device)
22 critic_loss = -(torch.mean(critic_real_pred)
23 - torch.mean(critic_fake_pred)) + LAMBDA_GP * gp
<ipython-input-15-f84354d74f37> in gradient_penalty(netCritic, real_image, fake_image, device)
8 # image
9 # interpolated image ← alpha *real image + (1 − alpha) fake image
---> 10 interpolated_image = (alpha*real_image) + (1-alpha) * fake_image
11
12 # calculate the critic score on the interpolated image
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Snippet of my WGan-gp code
def gradient_penalty(netCritic, real_image, fake_image, device=device):
batch_size, channel, height, width = real_image.shape
# alpha is selected randomly between 0 and 1
alpha = torch.rand(batch_size, 1, 1, 1).repeat(1, channel, height, width)
# interpolated image=randomly weighted average between a real and fake
# image
# interpolated image ← alpha *real image + (1 − alpha) fake image
interpolated_image = (alpha*real_image) + (1-alpha) * fake_image
# calculate the critic score on the interpolated image
interpolated_score = netCritic(interpolated_image)
# take the gradient of the score wrt to the interpolated image
gradient = torch.autograd.grad(inputs=interpolated_image,
outputs=interpolated_score,
retain_graph=True,
create_graph=True,
grad_outputs=torch.ones_like
(interpolated_score)
)[0]
gradient = gradient.view(gradient.shape[0], -1)
gradient_norm = gradient.norm(2, dim=1)
gradient_penalty = torch.mean((gradient_norm - 1)**2)
return gradient_penalty
n_epochs = 2000
cur_step = 0
LAMBDA_GP = 10
display_step = 50
CRITIC_ITERATIONS = 5
nz = 100
for epoch in range(n_epochs):
# Dataloader returns the batches
for real_image, _ in tqdm(dataloader):
cur_batch_size = real_image.shape[0]
real_image = real_image.to(device)
for _ in range(CRITIC_ITERATIONS):
fake_noise = get_noise(cur_batch_size, nz, device=device)
fake = netG(fake_noise)
critic_fake_pred = netCritic(fake).reshape(-1)
critic_real_pred = netCritic(real_image).reshape(-1)
# Calculate gradient penalty on real and fake images
# generated by generator
gp = gradient_penalty(netCritic, real_image, fake, device)
critic_loss = -(torch.mean(critic_real_pred)
- torch.mean(critic_fake_pred)) + LAMBDA_GP * gp
netCritic.zero_grad()
# To make a backward pass and retain the intermediary results
critic_loss.backward(retain_graph=True)
optimizerCritic.step()
# Train Generator: max E[critic(gen_fake)] <-> min -E[critic(gen_fake)]
gen_fake = netCritic(fake).reshape(-1)
gen_loss = -torch.mean(gen_fake)
netG.zero_grad()
gen_loss.backward()
# Update optimizer
optimizerG.step()
# Visualization code ##
if cur_step % display_step == 0 and cur_step > 0:
print(f"Step{cur_step}: GenLoss: {gen_loss}: CLoss: {critic_loss}")
display_images(fake)
display_images(real_image)
gen_loss = 0
critic_loss = 0
cur_step += 1
I tried to introduce cuda() at the lines 10 and 21 indicated in the error output.But not working.

Here is one approach to solve this kind of error:
Read the error message and locate the exact line where it occured:
... in gradient_penalty(netCritic, real_image, fake_image, device)
8 # image
9 # interpolated image ← alpha *real image + (1 − alpha) fake image
---> 10 interpolated_image = (alpha*real_image) + (1-alpha) * fake_image
11
12 # calculate the critic score on the interpolated image
RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu!
Look for input tensors that have not been properly transferred to the correct device. Then look for intermediate tensors that have not been transferred.
Here alpha is assigned to a random tensor but no transfer is done!
>>> alpha = torch.rand(batch_size, 1, 1, 1) \
.repeat(1, channel, height, width)
Fix the issue and test:
>>> alpha = torch.rand(batch_size, 1, 1, 1, device=fake_image.device) \
.repeat(1, channel, height, width)

Related

PyTorch: Multi-class segmentation loss value != 0 when using target image as the prediction

I was performing semantic segmentation using PyTorch. There are a total of 103 different classes in the dataset and the targets are RGB images with only the Red channel containing the labels. I was using nn.CrossEntropyLoss as my loss function. For sanity, I wanted to check if using nn.CrossEntropyLoss is correct for this problem and whether it has the expected behaviour
I pick a random mask from my dataset and create a categorical version of it using this custom transform
class ToCategorical:
def __init__(self, n_classes: int) -> None:
self.n_classes = n_classes
def __call__(self, sample: torch.Tensor):
mask = sample.permute(1, 2, 0)
categories = torch.unique(mask).tolist()[1:] # get all categories other than 0
# build a tensor with `n_classes` channels
one_hot_image = torch.zeros(self.n_classes, *mask.shape[:-1])
for category in categories:
# get spacial locs where the categ is present
rows, cols, _ = torch.where(mask == category)
# in same spacial loc but in `categ` channel fill 1
one_hot_image[category, rows, cols] = 1
return one_hot_image
And then I send this image as the output (prediction) and use the ground truth mask as the target to the loss function.
import torch.nn as nn
mask = T.PILToTensor()(Image.open("path_to_image").convert("RGB"))
categorical_mask = ToCategorical(103)(mask).unsqueeze(0)
mask = mask[0].unsqueeze(0) # get only the red channel, add fake batch_dim
loss_fn = nn.CrossEntropyLoss()
target = mask
output = categorical_mask
print(output.shape, target.shape)
print(loss_fn(output, target.to(torch.long)))
I expected the loss to be zero but to my surprise, the output is as follows
torch.Size([1, 103, 600, 800]) torch.Size([1, 600, 800])
tensor(4.2836)
I verified with other samples in the dataset and I obtained similar values for other masks as well. Am I doing something wrong? I expect the loss to be = 0 when the output is the same as the target.
PS. I also know that nn.CrossEntropyLoss is the same as using log_softmax followed by nn.NLLLoss() but even I obtained the same value by using nllloss as well
For Reference
Dataset used: UECFoodPixComplete
I would like to adress this:
I expect the loss to be = 0 when the output is the same as the target.
If the prediction matches the target, i.e. the prediction corresponds to a one-hot-encoding of the labels contained in the dense target tensor, but the loss itself is not supposed to equal to zero. Actually, it can never be equal to zero because the nn.CrossEntropyLoss function is always positive by definition.
Let us take a minimal example with number of #C classes and a target y_pred and a prediction y_pred consisting of prefect predictions:
As a quick reminder:
The softmax is applied on the logits (q_i) as p_i = log(exp(q_i)/sum_j(exp(q_j)):
>>> p = F.softmax(y_pred, 1)
Similarly if you are using the log-softmax, defined as logp_i = log(p_i):
>>> logp = F.log_softmax(y_pred, 1)
Then comes the negative likelihood function computed between x the input and y the target: -y*x. In association with the softmax, it comes down to -y*p, or -y*logp respectively. In any case, whether you apply the log or not, only the predictions corresponding to the true classes will remain since the others ones are zeroed-out.
That being said, applying the NLLLoss on y_pred would indeed result with a 0 as you expected in your question. However, here we apply it on the probability distribution or log-probability: p, or logp respectively!
In our specific case, p_i = 1 for the true class and p_i = 0 for all other classes (there are #C - 1 of those). This means the softmax of the logit associated with the true class will equal to exp(1)/sum_i(p_i). And since sum_i(p_i) = (#C-1)*exp(0) + exp(1). We therefore have:
softmax(p) = e / (#C - 1 + e)
Similarly for log-softmax:
log-softmax(p) = log(e / (#C-1 + e)) = 1 - log(#C - 1 + e)
If we proceed by applying the negative likelihood function we simply get cross-entropy(y_pred, y_true) = (nllloss o log-softmax)(y_pred, y_true). This results in:
loss = - (1 - log(#C - 1 + e)) = log(#C - 1 + e) - 1
This effectively corresponds to the minimum of the nn.CrossEntropyLoss function.
Regarding your specific case where #C = 103, you may have an issue in your code... since the average loss should equal to log(102 + e) - 1 i.e. around 3.65.
>>> y_true = torch.randint(0,103,(1,1,2,5))
>>> y_pred = torch.zeros(1,103,2,5).scatter(1, y_true, value=1)
You can see for yourself with one of the provided methods:
the builtin function nn.functional.cross_entropy:
>>> F.cross_entropy(y_pred, y_true[:,0])
tensor(3.6513)
manually computing the quantity:
>>> logp = F.log_softmax(y_pred, 1)
>>> -logp.gather(1, y_true).mean()
tensor(3.6513)
analytical result:
>>> log(102 + e) - 1
3.6513

Image denoising using Unet architecture

I'm new to computer vision and deep learning. I'm trying to train this Unet model https://github.com/kevinlu1211/pytorch-unet-resnet-50-encoder with Resnet50 as encoder. I want to implement it in a way that I pass two rgb images which are first processed by resnet50 and then the layers are concated before being passed to the decoder. I tried doing it and changed n_classes in the code to 3 to output a 3 channel rgb image just like the inputs but it gives me a distorted image like this which I don't understand why. Please help me with this.
The part in the code that I modified to process two rgb inputs by resnet50 is here -
for i, block in enumerate(self.down_blocks, 2): # for all the down blocks
x = block(x)
if i == (UNetWithResnet50Encoder.DEPTH - 1):
continue
pre_pools[f"layer_{i}"] = x ## creating all the down sampling layers
pre_pools_inp2 = dict()
pre_pools_inp2[f"layer_0"] = y
y = self.input_block(y) #
pre_pools_inp2[f"layer_1"] = y
y = self.input_pool(y)
for i, block in enumerate(self.down_blocks, 2): # for all the down blocks
y = block(y)
if i == (UNetWithResnet50Encoder.DEPTH - 1):
continue
pre_pools_inp2[f"layer_{i}"] = y ## creating all the down sampling layers
x = torch.cat([x,y],1)
x = self.bridge(x) # this is now the bridge between down sampling and up sampling
for i, block in enumerate(self.up_blocks, 1):
key = f"layer_{UNetWithResnet50Encoder.DEPTH - 1 - i}" # now using that bridge for upsampling f
x = block(x, pre_pools[key])
output_feature_map = x
x = self.out(x)
del pre_pools
if with_output_feature_map:
return x, output_feature_map
else:
return x

Pytorch:Apply cross entropy loss with custom weight map

I am solving multi-class segmentation problem using u-net architecture in pytorch.
As specified in U-NET paper, I am trying to implement custom weight maps to counter class imbalances.
Below is the opertion which I want to apply -
Also, I reduced the batch_size=1 so that I can remove that dimension while passing it to precompute_to_masks function.
I tried the below approach-
def precompute_for_image(masks):
masks = masks.cpu()
cls = masks.unique()
res = torch.stack([torch.where(masks==cls_val, torch.tensor(1), torch.tensor(0)) for cls_val in cls])
return res
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
###################
# train the model #
###################
model.train()
for batch_idx, (data, target) in enumerate(final_train_loader):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
output = model(data)
temp_target = precompute_for_image(target)
w = weight_map(temp_target)
loss = criterion(output,target)
loss = w*loss
loss.backward()
optimizer.step()
train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
return model
where weight_map is the function to calculate weight mask which I got from here
The issue, I am facing is I am getting memory error when I apply the following method. I am using 61gb RAM and Tesla V100 GPU.
I really think I am applying it in incorrect way.
How to do it?
I am omitting the non-essential details from the training loop.
Below is my weight_map function:
from skimage.segmentation import find_boundaries
w0 = 10
sigma = 5
def make_weight_map(masks):
"""
Generate the weight maps as specified in the UNet paper
for a set of binary masks.
Parameters
----------
masks: array-like
A 3D array of shape (n_masks, image_height, image_width),
where each slice of the matrix along the 0th axis represents one binary mask.
Returns
-------
array-like
A 2D array of shape (image_height, image_width)
"""
nrows, ncols = masks.shape[1:]
masks = (masks > 0).astype(int)
distMap = np.zeros((nrows * ncols, masks.shape[0]))
X1, Y1 = np.meshgrid(np.arange(nrows), np.arange(ncols))
X1, Y1 = np.c_[X1.ravel(), Y1.ravel()].T
for i, mask in enumerate(masks):
# find the boundary of each mask,
# compute the distance of each pixel from this boundary
bounds = find_boundaries(mask, mode='inner')
X2, Y2 = np.nonzero(bounds)
xSum = (X2.reshape(-1, 1) - X1.reshape(1, -1)) ** 2
ySum = (Y2.reshape(-1, 1) - Y1.reshape(1, -1)) ** 2
distMap[:, i] = np.sqrt(xSum + ySum).min(axis=0)
ix = np.arange(distMap.shape[0])
if distMap.shape[1] == 1:
d1 = distMap.ravel()
border_loss_map = w0 * np.exp((-1 * (d1) ** 2) / (2 * (sigma ** 2)))
else:
if distMap.shape[1] == 2:
d1_ix, d2_ix = np.argpartition(distMap, 1, axis=1)[:, :2].T
else:
d1_ix, d2_ix = np.argpartition(distMap, 2, axis=1)[:, :2].T
d1 = distMap[ix, d1_ix]
d2 = distMap[ix, d2_ix]
border_loss_map = w0 * np.exp((-1 * (d1 + d2) ** 2) / (2 * (sigma ** 2)))
xBLoss = np.zeros((nrows, ncols))
xBLoss[X1, Y1] = border_loss_map
# class weight map
loss = np.zeros((nrows, ncols))
w_1 = 1 - masks.sum() / loss.size
w_0 = 1 - w_1
loss[masks.sum(0) == 1] = w_1
loss[masks.sum(0) == 0] = w_0
ZZ = xBLoss + loss
return ZZ
Traceback of the error-
MemoryError Traceback (most recent call last)
<ipython-input-30-f0a595b8de7e> in <module>
1 # train the model
2 model_scratch = train(20, final_train_loader, unet, optimizer,
----> 3 criterion, train_on_gpu, 'model_scratch.pt')
<ipython-input-29-b481b4f3120e> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
24 loss = criterion(output,target)
25 target.requires_grad = False
---> 26 w = make_weight_map(target)
27 loss = W*loss
28 loss.backward()
<ipython-input-5-e75a6281476f> in make_weight_map(masks)
33 X2, Y2 = np.nonzero(bounds)
34 xSum = (X2.reshape(-1, 1) - X1.reshape(1, -1)) ** 2
---> 35 ySum = (Y2.reshape(-1, 1) - Y1.reshape(1, -1)) ** 2
36 distMap[:, i] = np.sqrt(xSum + ySum).min(axis=0)
37 ix = np.arange(distMap.shape[0])
MemoryError:
Your final_train_loader provides you with an input image data and the expected pixel-wise labeling target. I assume (following pytorch's conventions) that data is of shape B-3-H-W and of dtype=torch.float.
More importantly, target is of shape B-H-W and of dtype=torch.long.
On the other hand make_weight_map expects its input to be C-H-W (with C = number of classes, NOT batch size), of type numpy array.
Try providing make_weight_map the input mask as it expects it and see if you get similar errors.
I also recommend that you visualize the resulting weight map - to make sure your function does what you expect it to do.

tensorflow GPU crashes for 0 batch size CUDNN_STATUS_BAD_PARAM

This issue seem to be existing for a long time and lots of users are facing the issue.
stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 64 spatial: 7 264 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} t
o cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
The message is so mysterious that I do not know what happened in my code, however, my code works fine on CPU tensorflow.
I heard that we can use tf.cond to get around this, but I'm new to tensorflow-gpu, so can someone please help me? My code uses Keras and takes generator like input, this is to avoid any out-of-memory issue. The generator is built by a while True loop that spits out data by some batch size.
def resnet_model(bin_multiple):
#input and reshape
inputs = Input(shape=input_shape)
reshape = Reshape(input_shape_channels)(inputs)
#normal convnet layer (have to do one initially to get 64 channels)
conv = Conv2D(64,(1,bin_multiple*note_range),padding="same",activation='relu')(reshape)
pool = MaxPooling2D(pool_size=(1,2))(conv)
for i in range(int(np.log2(bin_multiple))-1):
print( i)
#residual block
bn = BatchNormalization()(pool)
re = Activation('relu')(bn)
freq_range = int((bin_multiple/(2**(i+1)))*note_range)
print(freq_range)
conv = Conv2D(64,(1,freq_range),padding="same",activation='relu')(re)
#add and downsample
ad = add([pool,conv])
pool = MaxPooling2D(pool_size=(1,2))(ad)
flattened = Flatten()(pool)
fc = Dense(1024, activation='relu')(flattened)
do = Dropout(0.5)(fc)
fc = Dense(512, activation='relu')(do)
do = Dropout(0.5)(fc)
outputs = Dense(note_range, activation='sigmoid')(do)
model = Model(inputs=inputs, outputs=outputs)
return model
model = resnet_model(bin_multiple)
init_lr = float(args['init_lr'])
model.compile(loss='binary_crossentropy',
optimizer=SGD(lr=init_lr,momentum=0.9), metrics=['accuracy', 'mae', 'categorical_accuracy'])
model.summary()
history = model.fit_generator(trainGen.next(),trainGen.steps(), epochs=epochs,
verbose=1,validation_data=valGen.next(),validation_steps=valGen.steps(),callbacks=callbacks, workers=8, use_multiprocessing=True)
The problem is when you model received 0 batch size. For me I had the error because I have 1000 example and I run it on multiple GPus ( 2 GPU) with batch size equal to 32 .And in My graph I divided the batch size to mini batch size to so each GPU take 16 example. At step 31 ( 31 * 32) I will finished 992 examples , so there is only 8 example left, it will go to GPU 1 and GPU2 will end with zero batch size that's why I received your error above.
Still couldn't solve it and still searching about proper solution.
I hope this help you to discover when in your code you received zero batch size.

Getting different accuracies using different caffe classes(98.65 vs 98.1 vs 98.20)

When I train and then test my model using Caffe's command line interface, I get e.g. 98.65% whereas when I myself write code(given below) to calculate accuracy from the same pre-trained model, I get e.g 98.1% using Caffe.Net.
Everything is straight forward and I have no idea what is causing the issue.
I also tried using Caffe.Classifier and its predict method, and yet get another lesser accuracy(i.e. 98.20%!)
Here is the snippet of code I wrote:
import sys
import caffe
import numpy as np
import lmdb
import argparse
from collections import defaultdict
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import itertools
from sklearn.metrics import roc_curve, auc
import random
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--proto', help='path to the network prototxt file(deploy)', type=str, required=True)
parser.add_argument('--model', help='path to your caffemodel file', type=str, required=True)
parser.add_argument('--mean', help='path to the mean file(.binaryproto)', type=str, required=True)
#group = parser.add_mutually_exclusive_group(required=True)
parser.add_argument('--db_type', help='lmdb or leveldb', type=str, required=True)
parser.add_argument('--db_path', help='path to your lmdb/leveldb dataset', type=str, required=True)
args = parser.parse_args()
predicted_lables=[]
true_labels = []
misclassified =[]
class_names = ['unsafe','safe']
count=0
correct = 0
batch=[]
plabe_ls=[]
batch_size = 50
cropx = 224
cropy = 224
i = 0
multi_crop = False
use_caffe_classifier = True
caffe.set_mode_gpu()
# Extract mean from the mean image file
mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
f = open(args.mean, 'rb')
mean_blobproto_new.ParseFromString(f.read())
mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
f.close()
net = caffe.Classifier(args.proto, args.model,
mean = mean_image[0].mean(1).mean(1),
image_dims = (224, 224))
net1 = caffe.Net(args.proto, args.model, caffe.TEST)
net1.blobs['data'].reshape(batch_size, 3,224, 224)
data_blob_shape = net1.blobs['data'].data.shape
#check and see if its lmdb or leveldb
if(args.db_type.lower() == 'lmdb'):
lmdb_env = lmdb.open(args.db_path)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
for key, value in lmdb_cursor:
count += 1
datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(value)
label = int(datum.label)
image = caffe.io.datum_to_array(datum).astype(np.float32)
#key,image,label
#buffer n image
if(count % 5000 == 0):
print('{0} samples processed so far'.format(count))
if(i < batch_size):
i+=1
inf= key,image,label
batch.append(inf)
#print(key)
if(i >= batch_size):
#process n image
ims=[]
for x in range(len(batch)):
img = batch[x][1]
#img has c,w,h shape! its already gone through transpose and channel swap when it was being saved into lmdb!
#Method III : use center crop just like caffe does in test time
if (use_caffe_classifier != True):
#center crop
c,w,h = img.shape
startx = h//2 - cropx//2
starty = w//2 - cropy//2
img = img[:, startx:startx + cropx, starty:starty + cropy]
#transpose the image so we can subtract from mean
img = img.transpose(2,1,0)
img -= mean_image[0].mean(1).mean(1)
#transpose back to the original state
img = img.transpose(2,1,0)
ims.append(img)
else:
ims.append(img.transpose(2,1,0))
if (use_caffe_classifier != True):
net1.blobs['data'].data[...] = ims[:]
out_1 = net1.forward()
plabe_ls = out_1['pred']
else:
out_1 = net.predict(np.asarray(ims), oversample=multi_crop)
plabe_ls = out_1
plbl = np.asarray(plabe_ls)
plbl = plbl.argmax(axis=1)
for j in range(len(batch)):
if (plbl[j] == batch[j][2]):
correct+=1
else:
misclassified.append(batch[j][0])
predicted_lables.append(plbl[j])
true_labels.append(batch[j][2])
batch.clear()
i = 0
sys.stdout.write("\rAccuracy: %.2f%%" % (100.*correct/count))
sys.stdout.flush()
print(", %i/%i corrects" % (correct, count))
What is causing this difference in accuracies ?
More information :
I am using Python3.5 on windows.
I read images from an lmdb dataset.
The images have 256x256 and center cropped with the size 224x224.
It is finetuned on GoogleNet.
For the Caffe.predict to work well I had to change classify.py
In training, I just use Caffes defaults, such as random crops at training and center crop at test-time.
Changes:
changed line 35 to:
self.transformer.set_transpose(in_, (2, 1, 0))
and line 99 to :
predictions = predictions.reshape((len(predictions) // 10, 10, -1))
1) First off, you need to revert Line 35 (32?) of classify.py: self.transformer.set_transpose(in_, (2, 1, 0)) back to the original
self.transformer.set_transpose(in_, (2, 0, 1)). So it expects HWC and transforms internally to CHW for downstream processing.
2) Run your Classifier branch as it is. You're likely to get a bad result. Please check this. If so, it means the image database is not CWH as you've commented, but actually CHW. After you've confirmed this, make the change to your Classifier branch: ims.append(img.transpose(2,1,0)) to become ims.append(img.transpose(1,2,0)). Re-test your Classifier branch. The result should be 98.2% (goto Step 3) or 98.65% (try Step 4).
3) If your result in Step 3 is 98.2%, also undo your the second change to classify.py. Theoretically, as your images have even height/width so // and / should have no difference. If it does differ or crashes, something is seriously wrong with your image database -- your assumption of the image size is incorrect. You need to check these. They could be off by a pixel or so, and could explain the slight discrepancies in accuracy.
4) If your result in Step 3 is 98.65%, then you need to make changes to the Caffe.Net branch of your code. The database images are CHW, so you need to make the first transpose: img = img.transpose(1,2,0) and the second transpose after mean subtraction to img = img.transpose(2,0,1). Then run your Caffe.Net branch. If you still get 98.1% as before, you should check that mean subtraction is performed correctly by your network.
In Steps (2) and (4), it's possible to get worse results, which means that the problem is likely a difference in mean subtraction for your trained Net vs your expectations in Python code. Check this.
About your 98.2% for the caffe.Classifier:
If you look at lines 78 - 80, the center crop is done along crop_dims , not img_dims. If you further look at line 42 on the caffe.Classifier constructor, the crop_dims are never user-determined. It's determined by the size of the Net's input blobs. Lastly, it you look at line 70, the img_dims are used to resize the images prior to center cropping. So what's happening with your setup is: a) The images are first getting resized to 224 x 224, then uselessly getting center cropped to 224 x 224 ( I assume this is the HxW for your Net ). You obviously will get results poorer than 98.65%. What you need to do is to change the img_dims = (256, 256). That prevents resizing. The crop will be picked up automatically from your Net and you should get your 98.65%.