To train the denoising autoencoder, I constructed x+n in the input data and x in the output data(x: original data, n: noise). After learning was completed, I obtained noise-removed data through a denoising autoencoder (x_test + n_test -> x_test).
Then, as a test, I trained autoencoder by constructing the input and output data to the same value, just like the conventional autoencoder
(x -> x).
As a result, i obtained noise-removed data similar to a denoising autoencoder in the test phase.
Why is noise removed through the conventional autoencoder?
Please tell me the difference between these two autoencoder.
An autoencoder's purpose is to map high dimensional data (e.g images) to a compressed form (i.e. hidden representation), and build up the original image from the hidden representation.
A denoising autoencoder, in addition to learning to compress data (like an autoencoder), it learns to remove noise in images, which allows to perform well even when the inputs are noisy. So denoising autoencoders are more robust than autoencoders + they learn more features from the data than a standard autoencoder.
And one of the uses of autoencoders was to find a good initialization for deep neural networks (in the late 2000s). However, with good initializations (e.g. Xavier) and activation functions (e.g. ReLU), their advantage has disappeared. Now they are more used in generative tasks (e.g. variational autoencoder)
Related
I have written code that transforms time series data (medical EEGs) into a scatter chart. I have labels available for a training set to note the presence of absence of 'seizures' in the data. I can see the seizures clearly (as a human) in the scatter plots. I thought it would be straightforward to adapt a pre-trained image classifier (AlexNet) to do the (seizure, no_seizure) binary classification. I have a training set of 500 chart images. The model is not converging.
I replaced the final Alexnet layer before training:
model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, 2)
Do you have any advice for helping me with this challenge? Intuitively I thought classification of scatter charts would be easier than photograph image classification.
I am working with a long sequence (~60 000 timesteps) classification task with continuous input domain. The input has the shape (B, L, C) where B is the batch size, L is the sequence length (i.e. timesteps) and C is the number of features where each feature is continuous (i.e. values like 0.6, 0.2, 0.5, 1.3, etc.).
Since the sequence is very long, I can't directly apply an RNN or Transformer Encoder layer without exceeding memory limits. Some proposed methods use several CNN layers to "downsample" the sequence length before feeding it into an RNN model. A successful example of this includes the CNN-LSTM model. By introducing several subsequent Convolutional blocks followed by max-pooling it is possible to "downsample" the sequence length by a given factor. The sampled sequence would instead have a sequence length of 60 timesteps for instance, which is much more manageable for an LSTM model.
Does it make sense to directly substitute the LSTM model with a Transformer encoder? I have read that the transformer attention mechanism can complement the LSTM layers and be used in succession.
There also exist many variants of Transformers and other architectures designed for handling long sequences. Latest examples include Performer, Linformer, Reformer, Nyströmformer, BigBird, FNet, S4, CDIL-CNN. Does there exist a library similar to torchvision for implementing these models in pytorch without copy-pasting large amounts of code from the respective repositories?
I have series of sensors (around 4k) and each sensor will measure the amplitudes at each point.Suppose I train the neural network with sufficent set of 4k values (N * 4k shape). The machine will find a pattern in the series of values.If the values stray away from the pattern (that is anomaly) it can detect the point and will be able to say that anomaly is in the 'X'th sensor.Is this possible.If so what kind of neural network should I use?
Since you are having a time series inputs you can use sequential models like RNN, LSTM, GRU. And use softmax layer at the end, which can output (normal/anomaly).
you can use the same model(weights) 4k times to find which sensor is at fault.
Or same sequential network can be trained with multi dimensional softmax (anomaly1/normal1 ... fault4k/ normal4k)
But such networks won't work well when data is imbalanced(anomalies are rare).
you can also try RPCA for anomaly detection.
I am trying to build an optical character recognition system that can recognize handwritten sentences using the LSTM cell.
Now what I have understood from literature is that you need to give two inputs to the LSTM cell: one is the image that you are trying to recognize and the second is the sequence of words it has already predicted. So for example if I had an image that read "I love machine learning", I would create the following pairs of input:
Image + startseq
Image + startseq + I
Image + startseq + I + love
So for each input you want the LSTM to predict the next word i.e. I, love, machine for the above sequences.
The problem that I'm having is that I can't figure out how to input the image AND the previous sequence to the LSTM cell. Do I divide my image (a 2-D matrix) into row/column vectors and send them to the LSTM one at a time and after I'm done with that I send in the previous sequence of words? But this way I'll have quite long input sequences which might lead to large converging times.
I know image captioning tasks vectorize input images using pretrained neural nets but can that be done for optical character recognition systems, i.e. would that cause accuracy issues?
No, you don't have to feed recognized words back into LSTM. You only feed an input (feature) sequence and the LSTM learns to propagate relevant information through this sequence.
You should think of input sequence and output sequence when talking about Recurrent Neural Networks (RNNs).
The input to a RNN at time-step t is:
state of memory cell at t-1
input element at t
LSTM has a more advanced internal structure than a vanilla RNN to allow more robust training. But from a user perspective, it works just like a vanilla RNN. You input a sequence and LSTM computes a output sequence for you.
When doing handwriting recognition, you usually extract a feature sequence from the input image (e.g. by using convolutional layers).
Then, you feed this feature sequence into LSTM layers.
You map the output sequence to a character-probability matrix which is then decoded into the final text by the CTC layer.
Here is a short tutorial how to build a handwriting recognition system, it should give you an idea of which data (see "Data": "CNN output" and "RNN output") flows into LSTM and which data flows out of LSTM:
https://towardsdatascience.com/2326a3487cd5
In the below code, they use autoencoder as supervised clustering or classification because they have data labels.
http://amunategui.github.io/anomaly-detection-h2o/
But, can I use autoencoder to cluster data if I did not have its labels.?
Regards
The deep-learning autoencoder is always unsupervised learning. The "supervised" part of the article you link to is to evaluate how well it did.
The following example (taken from ch.7 of my book, Practical Machine Learning with H2O, where I try all the H2O unsupervised algorithms on the same data set - please excuse the plug) takes 563 features, and tries to encode them into just two hidden nodes.
m <- h2o.deeplearning(
2:564, training_frame = tfidf,
hidden = c(2), auto-encoder = T, activation = "Tanh"
)
f <- h2o.deepfeatures(m, tfidf, layer = 1)
The second command there extracts the hidden node weights. f is a data frame, with two numeric columns, and one row for every row in the tfidf source data. I chose just two hidden nodes so that I could plot the clusters:
Results will change on each run. You can (maybe) get better results with stacked auto-encoders, or using more hidden nodes (but then you cannot plot them). Here I felt the results were limited by the data.
BTW, I made the above plot with this code:
d <- as.matrix(f[1:30,]) #Just first 30, to avoid over-cluttering
labels <- as.vector(tfidf[1:30, 1])
plot(d, pch = 17) #Triangle
text(d, labels, pos = 3) #pos=3 means above
(P.S. The original data came from Brandon Rose's excellent article on using NLTK. )
In some aspects encoding data and clustering data share some overlapping theory. As a result, you can use Autoencoders to cluster(encode) data.
A simple example to visualize is if you have a set of training data that you suspect has two primary classes. Such as voter history data for republicans and democrats. If you take an Autoencoder and encode it to two dimensions then plot it on a scatter plot, this clustering becomes more clear. Below is a sample result from one of my models. You can see a noticeable split between the two classes as well as a bit of expected overlap.
The code can be found here
This method does not require only two binary classes, you could also train on as many different classes as you wish. Two polarized classes is just easier to visualize.
This method is not limited to two output dimensions, that was just for plotting convenience. In fact, you may find it difficult to meaningfully map certain, large dimension spaces to such a small space.
In cases where the encoded (clustered) layer is larger in dimension it is not as clear to "visualize" feature clusters. This is where it gets a bit more difficult, as you'll have to use some form of supervised learning to map the encoded(clustered) features to your training labels.
A couple ways to determine what class features belong to is to pump the data into knn-clustering algorithm. Or, what I prefer to do is to take the encoded vectors and pass them to a standard back-error propagation neural network. Note that depending on your data you may find that just pumping the data straight into your back-propagation neural network is sufficient.