Where we are passing labels to CNN Image Classifier? - deep-learning

I have a simple&basic question. When we are training an image classifier model(ie with CNN) exactly where we are telling the model this is a cat or a dog? Before backpropagation, the model knows that it is a dog or a cat because it is arranging new weights accordingly.
Our data is labelled but we are not reading or passing this information to the network. I am working with image captioning which is much more complicated. I appreciate any clarification, thank you in advance!

It is used in the loss (aka cost) function. When you give an image as an input to a network and some output is activated, the difference between the output and the truth is called loss. So, the optimization algorithm tries to minimize these losses by changing weights.

Related

What is the ideal steps in using predicted segmentation masks for watershed post processing?

I am experimenting with object segmentation(round shaped objects that are often occur close together). I have used UNET deep neural network architecture for segmentation and obtained segmentation masks. I saved those in npy format.
I am a beginner in this area. I would like to know the ideal steps that I should follow now, if I want to apply watershed on the predicted masks with the aim of separating the objects.
I guess I need to convert the binary mask predicted to some form so that I can obtain some kind of markers indicating centroids.
Please help

Is it theoretically reasonable to use CNN for data like categorical and numeric data?

I'm trying to use CNN to do a binary classification.
As CNN shows its strength in feature extraction, it has been many uses for pattern data like image and voice.
However, the dataset I have is not image or voice data, but categorical data and numerical data, which are different from this case.
My question is as follows.
In this situation, Is it theoretically reasonable to use CNN for data in this configuration?
If it is reasonable, would it be reasonable to artificially place my dataset in a two-dimensional form and perform a 2D-CNN?
I often see examples of using CNN in many classifiers through Kaggle and various media, and I can see not only images and voices, but also numerical and categorical data like mine.
I really wonder this is theoretically a problem, and I would appreciate it if you could recommend it if you knew about the related paper or research.
I'm looking forward to hearing any advice about this situation. Thank you for your answer.
CNNs for images apply kernels to neighboring pixels and blocks of image. CNNs for audio work on spectrograms, i.e. use input data proximity as well.
If your data inputs has some sort of closeness (e.g. time-series, graph...), then CNN might be useful.

how to train pre-trained CNN on new dataset which is not organised in classes (Unsupervised)

I have a pretrained CNN (Resnet-18) trained on Imagenet, now i want to extend it on my own dataset of video frames , now the point is all tutorials i found on Finetuning required dataset to be organised in classes like
class1/train/
class1/test/
class2/train/
class2/test/
but i have only frames on many videos , how will i train my CNN on it.
So can anyone point me in right direction , any tutorial or paper etc ?
PS: My final task is to get deep features of all frames that i provide at the time of testing
for training network, you should have some 'label'(sometimes called y) of your input data. from there, network calculate loss between logit(answer of network) and the given label.
And the network will self-revise using that loss value by backpropagating. that process is what we call 'training'.
Because you only have input data, not label, so you can get the logit only. that means a loss cannot be calculated.
Fine tuning is almost same word with 'additional training', so that you cannot fine tuning your pre-trained network without labeled data.
About train set & test set, that is not the problem right now.
If you have enough labeled input data, you can divide it with some ratio.
(e.g. 80% of data for training, 20% of data for testing)
the reason why divide data into these two sets, we want to check the performance of our trained network more general, unseen situation.
However, if you just input your data into pre-trained network(encoder part), it will give a deep feature. It doesn't exactly fit to your task, still it is deep feature.
Added)
Unsupervised pre-training for convolutional neural network in theano
here is the method you need, deep feature encoder in unsupervised situation. I hope it will help.

how to set up the appropriate model setting and layers for high intra-class variation

all experts
I am new in CNN and Caffe. I have a task in classification between 2 classes. The data set that I have collected is very small about 50 for class A and 50 for class B (I know that it is very very small). It is a human images.
I took the BVLC model and made a change such as Batch size for testing and training and also the maximum iteration. I try with many various setup, but the model doesn't work.
Any idea about how to come up with appropriate model or setting or other solutions ?
remark** I once randomly made a change to the BVLC model setup and it worked, but i lost the set up file.
For the train.prototxt and Solve.prototxt, I get it from this guy Adil Moujahid
I did try training batch size as 32,64,128,256 and testing for 5,20,30 but failed
For the data set, it is images of normal women and beautiful women and i will classify it, but Stackoverflow does not allowed me to add more than 2 links
I wonder that is there any formula , equation or steps that I can come up with and choose the right model setting.
Thank you in advance.
What is your meaning in "doesn't work"? Loss stays too high? Training is converged, but accuracy is low? Andrew Ng has an excellent session on "debugging" CNNs - Nuts and Bolts of Building Applications using Deep Learning (NIPS slides, summary, additional summary).
My humble guess is that your network has an overfitting problem - it learns the specific examples and can't generalize - so increasing the train dataset / regularization / data augmentation can help.

Many challenges to obtain semantic segmentation results for a long time

I did not have any choice except asking here. I have a lot of difficulties for a long time. I have not been to observe any output from FCN32 :(
I trained FCN32 on my data from scratch and always getting a black image. I added gaussian with std= 0.01 initialization for convolutional layers. But still I get black image.
I tried to add weighted loss layers. However, I was not successful to add it correctly. I am not good at python and c++.
My questions:
Is there any correct PR that it can easily include this layer?
My data has 5 classes that the proportion of classes differ from each other in different images. How can I create these weight matrices for each image?
I really appreciate any help. Please share if you know any resource/link/ or if I can get it from other networks' repositories.