U-Net segmentation without having mask - deep-learning

I am new to deep learning and Semantic segmentation.
I have a dataset of medical images (CT) in Dicom format, in which I need to segment tumours and organs involved from the images. I have labelled organs contoured by our physician which we call it RT structure stored in Dicom format also.
As far as I know, people usually use "mask". Does it mean I need to convert all the contoured structure in the rt structure to mask? or I can use the information from the RT structure (.dcm) directly as my input?
Thanks for your help.

There is a special library called pydicom that you need to install before you can actually decode and later visualise the X-ray image.
Now, since you want to apply semantic segmentation and you want to segment the tumours, the solution to this is to create a neural network which accepts as input a pair of [image,mask], where, say, all the locations in the mask are 0 except for the zones where the tumour is, which are marked with 1; practically your ground truth is the mask.
Of course for this you will have to implement your CustomDataGenerator() which must yield at every step a batch of [image,mask] pairs as stated above.

Related

How to use K means clustering to visualise learnt features of a CNN model?

Recently I was going through the paper : "Intriguing Properties of Contrastive Losses"(https://arxiv.org/abs/2011.02803). In the paper(section 3.2) the authors try to determine how well the SimCLR framework has allowed the ResNet50 Model to learn good quality/generalised features that exhibit hierarchical properties. To achieve this, they make use of K-means on intermediate features of the ResNet50 model (intermediate means o/p of block 2,3,4..) & quote the reason -> "If the model learns good representations then regions of similar objects should be grouped together".
Final Results :
KMeans feature visualisation
I am trying to replicate the same procedure but with a different model (like VggNet, Xception), are there any resources explaining how to perform such visualisations ?
The procedure would be as follow:
Let us assume that you want to visualize the 8th layer from VGG. This layer's output might have the shape (64, 64, 256) (I just took some random numbers, this does not correspond to actual VGG). This means that you have 4096 256-dimensional vectors (for one specific image). Now you can apply K-Means on these vectors (for example with 5 clusters) and then color your image corresponding to the clustering result. The coloring is easy, since the 64x64 feature map represents a scaled down version of your image, and thus you just color the corresponding image region for each of these vectors.
I don't know if it might be a good idea to do the K-Means clustering on the combined output of many images, theoretically doing it on many images and one a single one should both give good results (even though for many images you probably would increase the number of clusters to account for the higher variation in your feature vectors).

Reconstruction of shape from elliptic Fourier descriptors

I have e extracted the elliptic Fourier descriptors for each otolith; but couldn't figure out how to normalize them with respect to the first harmonic and how to reconstruct mean shapes from them for each stations. I try myself, but couldn't get any results using Momocs pacage. Need expert helps in R script. Data in excel file
to use "first harmonic" normalization, just pass efourier() with default parameters (ie with norm=TRUE).
Have a look to Details section in ?efourier since this is usually not the best way to go (and I think it's very valid for otoliths)
feel free to contact me directly !
all the best

Model suggestion: Keyword spotting

I want to predict the occurrences of the word "repeat" in a speech as well as the word's approximate duration. For this task, I'm planning to build a Deep Learning model. I've around 50 positive as well as 50 negative utterances (I couldn't collect more).
Initially I've searched for any pretrained models for keyword spotting, but I couldn't get a good one.
Then I tried Speech Recognition models (Deep Speech), but it couldn't predict the exact repeat words as my data follows Indian accent. Also, I've thought that going for ASR models for this task would be a over-killing one.
Now, I've split the entire audio into chunk of 1 secs with 50% overlapping and tried a binary audio classification in each chunk that is whether the chunk has the word "repeat" or not. For building the classification model, I calculated the MFCC features and build a sequence model on the top of it. Nothing seems to work for me.
If anyone already worked with this kind of task, please provide me with a correct method/resources to build a DL model for this task. Thanks in advance!

Does any H2O algorithm support multi-label classification?

Is deep learning model supports multi-label classification problem or any other algorithms in H2O?
Orginal Response Variable -Tags:
apps, email, mail
finance,freelancers,contractors,zen99
genomes
gogovan
brazil,china,cloudflare
hauling,service,moving
ferguson,crowdfunding,beacon
cms,naytev
y,combinator
in,store,
conversion,logic,ad,attribution
After mapping them on the keys of the dictionary:
Then
Response variable look like this:
[74]
[156, 89]
[153, 13, 133, 40]
[150]
[474, 277, 113]
[181, 117]
[15, 87, 8, 11]
Thanks
No, H2O only contains algorithms that learn to predict a single response variable at a time. You could turn each unique combination into a single class and train a multi-class model that way, or predict each class with a separate model.
Any algorithm that creates a model that gives you "finance,freelancers,contractors,zen99" for one set of inputs, and "cms,naytev" for another set of inputs is horribly over-fitted. You need to take a step back and think about what your actual question is.
But in lieu of that, here is one idea: train some word embeddings (or use some pre-trained ones) on your answer words. You could then average the vectors for each set of values, and hope this gives you a good numeric representation of the "topic". You then need to turn your, say, 100 dimensional averaged word vector into a single number (PCA comes to mind). And now you have a single number that you can give to a machine learning algorithm, and that it can predict.
You still have a problem: having predicted a number, how do you turn that number into a 100-dim vector, and from there in to a topic, and from there into topic words? Tricky, but maybe not impossible.
(As an aside, if you turn the above "single number" into a factor, and have the machine learning model do a categorization, to predicting the most similar topic to those it has seen before... you've basically gone full circle and will get a model identical to the one you started with that has too many classes.)

Machine Learning for gesture recognition with Myo Armband

I'm trying to develop a model to recognize new gestures with the Myo Armband. (It's an armband that possesses 8 electrical sensors and can recognize 5 hand gestures). I'd like to record the sensors' raw data for a new gesture and feed it to a model so it can recognize it.
I'm new to machine/deep learning and I'm using CNTK. I'm wondering what would be the best way to do it.
I'm struggling to understand how to create the trainer. The input data looks like something like that I'm thinking about using 20 sets of these 8 values (they're between -127 and 127). So one label is the output of 20 sets of values.
I don't really know how to do that, I've seen tutorials where images are linked with their label but it's not the same idea. And even after the training is done, how can I avoid the model to recognize this one gesture whatever I do since it's the only one it's been trained for.
An easy way to get you started would be to create 161 columns (8 columns for each of the 20 time steps + the designated label). You would rearrange the columns like
emg1_t01, emg2_t01, emg3_t01, ..., emg8_t20, gesture_id
This will give you the right 2D format to use different algorithms in sklearn as well as a feed forward neural network in CNTK. You would use the first 160 columns to predict the 161th one.
Once you have that working you can model your data to better represent the natural time series order it contains. You would move away from a 2D shape and instead create a 3D array to represent your data.
The first axis shows the number of samples
The second axis shows the number of time steps (20)
The thirst axis shows the number of sensors (8)
With this shape you're all set to use a 1D convolutional model (CNN) in CNTK that traverses the time axis to learn local patterns from one step to the next.
You might also want to look into RNNs which are often used to work with time series data. However, RNNs are sometimes hard to train and a recent paper suggests that CNNs should be the natural starting point to work with sequence data.