I am trying to build multilabel classifier so I would be able to identify and count objects in the picture. I have my own dataset that I have collected, and I labeled regions.
For the sake of simplicity, this is an example. Let's say that I am building a classifier to classify dogs and cats that appear in the same image, I have prepared the dataset in a way that I define regions.
if the image's size is 1200*1200
First region: [[0,600],[0,1200] will be labeled as a 0 (cat),
Second region: [[600,1200],[0,1200] will be labeled as a 1 (dog).
By reading this example in here, I couldn't figure out how to define regions so I train the classifier to know that an object X is in region [x,y,w,z] ?
Preparing the script this way would help me to categorize the image, but it wouldn't help counting number of objects in the image
Any help would be appreciated.
Related
I have written code that transforms time series data (medical EEGs) into a scatter chart. I have labels available for a training set to note the presence of absence of 'seizures' in the data. I can see the seizures clearly (as a human) in the scatter plots. I thought it would be straightforward to adapt a pre-trained image classifier (AlexNet) to do the (seizure, no_seizure) binary classification. I have a training set of 500 chart images. The model is not converging.
I replaced the final Alexnet layer before training:
model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, 2)
Do you have any advice for helping me with this challenge? Intuitively I thought classification of scatter charts would be easier than photograph image classification.
I'm currently working on my bachelor project and I'm using the PointNet deep neural network.
My project group and I have created a dataset of point clouds(an unsorted list of x amount of 3d coordinates) and segmentation files, but we can't train PointNet to predict segmentation with the dataset.
Each segmentation file is a list containing the same amount of rows, as points in the corresponding point cloud, and each row is either a 1 or a 2, depending on the corresponding point belonging to segment 1 or 2.
When PointNet predicts it outputs a list of x elements, where each element is the segment that PointNet predicts the corresponding point belongs to.
When we run the benchmark dataset from the original PointNet implementation, the system runs and can predict segmentation, so we know that the error is in the dataset somewhere, even though we have tried our best to have our dataset look like the original benchmark dataset.
The implemented PointNet uses pytorch conv2d, maxpool2d and linear transformation. For calculating the loss, both the nn.functional.nll_loss and the nn.NLLLos functions have been used. When using the nn.NLLLos the weight parameter was set to a tensor of [1,100] to combat potential imbalance of the data.
These are the thing we have tried:
We have tried downsampling the point clouds i.e remove points using voxel downsampling
We have tried downscaling and normalize all values so they are between 0 and 1, using this formula (data - np.min(data)) / (np.max(data) - np.min(data))
We have tried running an euclidean clustering function on the data, to have each scanned object for it self
We have tried replicating another dataset, which was created using the same raw data, which we know have worked before
In the attached link, images of the datafiles with a description can be found.
Cheers everyone
I'm studying on a deep learning(supervised-learning) to estimate depth images from monocular images.
And the dataset currently uses KITTI data. RGB images (input image) are used KITTI Raw data, and data from the following link is used for ground-truth.
In the process of learning a model by designing a simple encoder-decoder network, the result is not so good, so various attempts are being made.
While searching for various methods, I found that groundtruth only learns valid areas by masking because there are many invalid areas, i.e., values that cannot be used, as shown in the image below.
So, I learned through masking, but I am curious about why this result keeps coming out.
and this is my training part of code.
How can i fix this problem.
for epoch in range(num_epoch):
model.train() ### train ###
for batch_idx, samples in enumerate(tqdm(train_loader)):
x_train = samples['RGB'].to(device)
y_train = samples['groundtruth'].to(device)
pred_depth = model.forward(x_train)
valid_mask = y_train != 0 #### Here is masking
valid_gt_depth = y_train[valid_mask]
valid_pred_depth = pred_depth[valid_mask]
loss = loss_RMSE(valid_pred_depth, valid_gt_depth)
As far as I can understand, you are trying to estimate depth from an RGB image as input. This is an ill-posed problem since the same input image can project to multiple plausible depth values. You would need to integrate certain techniques to estimate accurate depth from RGB images instead of simply taking an L1 or L2 loss between an RGB image and its corresponding depth image.
I would suggest you to go through some papers in estimating depth from single images such as: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network where they use a network to first estimate the global structure of the given image and then use a second network that refines the local scene information. Instead of taking a simple RMSE loss, as you did, they use a scale-invariant error function in which the relationship between points is measured.
I'm new at deep learning and I'm writing a 1D CNN model to train my dataset. Each data in the dataset contains three channels: (raw_data, sex, age). The sample rate of raw_data is 50Hz, and window size is 2 seconds, which means 100 samples in one data. So my data will look like this
([1,2,3,4,...,100],[1],[20]) where [1] means sex and [20] means age.
Now I want to put this data into my 1D CNN model. I know that nn.Conv1d(batch_size, channel, width). If width in all channels are the same, there is no problem. But in my case, raw_data contains 100 elements, sex and age only contain 1 element. How can I put this into my Conv1d layer?
Any help would be appreciated. Thank you!
I'm new in Deep Learning and I started with the TenserFlow tutorials (The beginner one and the expert one).In both of them, the data is imported at the beginning with these 2 lines :
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
I would like to use this neural network on my own images. I have 100 000 images et a fileLabel.txt giving the labels for each image in order by column. Is there a way to change these two lines or a few others to import my images without breaking all the code ? I really don't see how to do that, I have the impression that the structure mnist is specific to the images of the tutorial.
Thanks in advance for your help
Short answer to your question is yes - its possible. You don't need to break any code IF your data is also similar to MNIST data with 10 labels and well organized.
Assuming that is not the case, then you need to organize your input data so that you can define (create) your model.
Organizing of your input data includes
Having consistent image sizes (for example MNIST is 28x28 pixel images)
Labeling of your images (for example MNIST has 10 labels - 0 to 9)
Finally how you intend to split your data (for example MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation).
Once you organize your input data, then you read your data by writing a small function like read_images that does something like
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
....
Then assuming you want to "label" ahead of time similar to MNIST data, you can store them in a file and read them in your program.
After that, you would have to populate the tf.train.string_input_producer() with a list of strings containing the filename and the label.
....