Given a pre-trained ResNet152, in trying to calculate predictions bench-marks using some common datasets (using PyTorch), and the first RGB dataset that came to mind was CIFAR10. The thing is that CIFAR10 data is 3x32x32 and ResNet expects 3x224x224. I've resized the data using the known approach of transforms:
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
train = datasets.CIFAR10(root='./data', train=True, download=True, transform=preprocess)
test = datasets.CIFAR10(root='./data', train=False, download=True, transform=preprocess)
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size)
but this results in blurry samples and bad predictions. I was wondering what are the best approaches in those cases, as I see many papers using those datasets given advanced models like ResNes and VGG, and I'm not sure how this technical issue could be resolved.
thank you for your response!
Yes, you need to resize input images to the size 3x224x224. By doing so, after a normal training procedure, you should achieve outstanding results on CIFAR-10 (like 96% on the test-set).
I guess the main problem is that you're using a network that is pre-trained on higher resolution images (resnet152 comes pre-trained on imageNet), without any other training you can't expect good results changing the dataset drastically.
For image classification benchmarks, I recommend you to use the commonly used datasets (also pre-defined inside torch vision) with higher resolution: LSUN or Places365.
Related
I'm using Resnet50 model to classify images into two classes: normal cells and cancer cells.
so I want to to increase the accuracy but i don't know what to modify.
# we are using resnet50 for transfer learnin here. So we have imported it
from tensorflow.keras.applications import resnet50
# initializing model with weights='imagenet'i.e. we are carring its original weights
model_name='resnet50'
base_model=resnet50.ResNet50(include_top=False, weights="imagenet",input_shape=img_shape, pooling='max')
last_layer=base_model.output # we are taking last layer of the model
# Add flatten layer: we are extending Neural Network by adding flattn layer
flatten=layers.Flatten()(last_layer)
# Add dense layer
dense1=layers.Dense(100,activation='relu')(flatten)
# Add dense layer to the final output layer
output_layer=layers.Dense(class_count,activation='softmax')(flatten)
# Creating modle with input and output layer
model=Model(inputs=base_model.inputs,outputs=output_layer)
model.compile(Adamax(learning_rate=.001), loss='categorical_crossentropy', metrics=['accuracy'])
There were 48 errors in 534 test cases Model accuracy= 91.01 %
Also what do you think about the results of the graph?
this is the classification report
i got good results but is there a possibility to increase accuracy more than that?
This is a broad question as there are many ways one can attempt to generally improve the network's accuracy. some of which may be
Increase the dimension of the layers that are learned in transfer learning (make sure not to overfit)
Use transfer learning with Convolution layers and not MLP
let the optimization algorithm choose the learning rate on its own
Play with additional augmentations to the dataset
and the list goes on.
Also, if possible, I would suggest comparing your results to other publicly available benchmarks - by doing so you might understand the upper bounds of the accuracies better
I am confused about the definition of data augmentation. Should we train the original data points and the transformed ones or just the transformed? If we train both, then we will increase the size of the dataset while the second approach won't.
I got this question when using the function RandomResizedCrop.
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
If we resize and crop some of the dataset randomly, we don't actually increase the size of the dataset for data augmentation. Is that correct? Or data augmentation just requires the change/modification of original dataset rather than increase the size of it?
Thanks.
By definition, or at least the influential paper AlexNet from 2012 that popularized data augmentation in computer vision, increases the size of the training set. Hence the word augmentation. Go ahead and have a look at Section 4.1 from the AlexNet paper. But, here is the gist of it, which I'm quoting from the paper:
The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations. The first form of data augmentation consists of generating image translations and horizontal reflections. We do this by extracting random 224 × 224 patches (and their horizontal reflections) from the 256×256 images and training our network on these extracted patches. This increases the size of our training set by a factor of 2048, though the resulting training examples are, of course, highly interdependent.
As for the specific implementation, it depends on your use case and most importantly the size of your training data. If you're short in quantity in the latter case, you should consider training on original and transformed images, by sufficiently taking care to preserve the labels.
transform.compose is just as preprocessing an image ,convert one form to particular suiatable form
On apply transformation on single image means changing it pixel value and it does not increases dataset size
for more dataset you have to perform operation such below:
final_train_data = []
final_target_train = []
for i in tqdm(range(train_x.shape[0])):
final_train_data.append(train_x[i])
final_train_data.append(rotate(train_x[i], angle=45, mode = 'wrap'))
final_train_data.append(np.fliplr(train_x[i]))
final_train_data.append(np.flipud(train_x[i]))
final_train_data.append(random_noise(train_x[i],var=0.2**2))
for j in range(5):
final_target_train.append(train_y[i])
for more detail pytorch
I've a small dataset of 500 plant images and I have to predict a number for a single image in range [1, 10]. There is a order relation between the numbers (10 > 9 > ... > 1). This problem is similar to age estimation based on a single photo.
I tried regression using Resnet18, Resnet34 and VGG16. None of them gave a very good result.
The interesting point is that when I plotted the heatmap for a few images it showed that the model was picking the wrong spots to predict the answer. It's like, if I was suppose to predict age based on facial photo, the cnn gave more value to the background than to the actual face.
I tried other approachs as well, like classification and learning to rank, but it happens the same thing when I do heatmap. In these approachs, the best accuracy I get is 30% using classification and 35% using learning to rank.
The regression and classification approachs I used Fastai implementation with pretrained. The learning to rank approach I used this : https://github.com/Raschka-research-group/coral-cnn. I changed a little bit to be able to use a pretrained model as well.
Another important point is that the dataset is unbalanced. 80% of the dataset corresponds to classes 6 to 10.
Does anyone has any tips to improve it or another approach I could try?
EDIT:
My data augmentation looks like this:
transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224),
transforms.RandomHorizontalFlip(p=0.5),
transforms.ColorJitter(brightness=0.15),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.299, 0.224, 0.225])
])
You can try augmenting your dataset to obtain more data (e.g. random cropping, rotating, etc), and make sure you normalise your data. For the class imbalance problem, you can try using PyTorch's WeightedRandomSampler:
#Let there be 9 samples in class 0 and 1 sample in class 1 respectively
class_counts = [9.0, 1.0]
num_samples = sum(class_counts)
labels = [0, 0,..., 0, 1] #corresponding labels of samples
class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
weights = [class_weights[labels[i]] for i in range(int(num_samples))]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
You should be able to apply this to your case with 10 classes easily, hope this solves your problem!
I came across a tutorial where the autor use a LSTM network for a time series prediction like this :
trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
We agree that the LSTM in this act like a normal NN (and is useless ?) since the LSTM got only one time step without stateful = TRUE , Am I right ?
Generally speaking, you are correct. The input shape should be (window length, features n).
However, there has been some success in transforming the input to the way you describe above. Below is a whitepaper where they were able to beat many top performing algorithms by doing so, and they used convolutional 1D layers to handle the time series pattern through a separate input.
LSTM Fully Convolutional Networks for Time
Series Classification
I am working with Keras 2.0.0 and I'd like to train a deep model with a huge amount of parameters on a GPU.
As my data are big, I have to use the ImageDataGenerator. To be honest, I want to abuse the ImageDataGenerator in that sense, that I don't want to perform any augmentations. I just want to put my training images into batches (and rescale them), so I can feed them to model.fit_generator.
I adapted the code from here and did some small changes according to my data (i.e. changing binary classification to categorical. But this doesn't matter for this problem which should be discussed here).
I have 15000 train images and the only 'augmentation' I want to perform, is rescaling to scope [0,1] by train_datagen = ImageDataGenerator(rescale=1./255).
After creating my 'train_generator' :
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle = True,
seed = 1337,
save_to_dir = save_data_dir)
I fit the model by using model.fit_generator().
I set amount of epochs to: epochs = 1
And batch_size to: batch_size = 60
What I expect to see in the directory where my augmented (i.e. resized) images are stored: 15.000 rescaled images per epoch, i.e. with only one epoch: 15.000 rescaled images. But, mysteriously, there are 15.250 images.
Is there a reason for this amount of images?
Do I have the power to control the amount of augmented images?
Similar problems:
Model fit_generator not pulling data samples as expected (respectively at stackoverflow: Keras - How are batches and epochs used in fit_generator()?)
A concrete example for using data generator for large datasets such as ImageNet
I appreciate your help.