Sliding Transformer model into longer sequence - deep-learning

I have very long genome sequences where I have to do some classification stuff on top. What I want to try is to use a transformer to predict the next token from the 512 chunk sequence and slide this transformer to the whole sequence and use those to work on top of the whole sequence.
Let’s say an example: Imagine I have a 250.000 token sequence, I should slide the transformer 488 times producing 488 tokens. Concatenate this output to obtain a summary array of the sequence and build a classifier on top of it.
I’m trying to find any examples that could guide me in this direction but I hardly can find any of them. Does someone think that’s a good idea? Where could I look for some near examples of sliding a transformer/LSTM over a longer sequence?
Thank you very much, I’ll appreciate everything!

Related

sequence to sequence model using pytorch

I have dataset (sequence to sequence), each sample input is seq of charterers (combination from from 20 characters and max length 2166) and out is list of charterers (combination of three characters G,H,B). for example OIREDSSSRTTT ----> GGGHHHHBHBBB
I would like to do simple pytorch model that work in that type of dataset. Model that can predict sequence of classes. I would appreciate any suggestions or links for simple mode that do the same?
Thanks
If the output sequence always has the same length as the input sequence, you might want to use transformer encoder, because it basically transforms the inputs with attention to the context. Also you can try to use anything that is used to tagging: BiLSTM, BiGRU, etc.
If you want your model to be able to predict sequences of different length (not necessary the same as input length), look at some encoder-decoder models, such as vanilla transformer.
You can start with the sequence tagging model from PyTorch tutorial https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html .
As #Ilya Fedorov said, you can move to transformer models for potentially better performance.

Model suggestion: Keyword spotting

I want to predict the occurrences of the word "repeat" in a speech as well as the word's approximate duration. For this task, I'm planning to build a Deep Learning model. I've around 50 positive as well as 50 negative utterances (I couldn't collect more).
Initially I've searched for any pretrained models for keyword spotting, but I couldn't get a good one.
Then I tried Speech Recognition models (Deep Speech), but it couldn't predict the exact repeat words as my data follows Indian accent. Also, I've thought that going for ASR models for this task would be a over-killing one.
Now, I've split the entire audio into chunk of 1 secs with 50% overlapping and tried a binary audio classification in each chunk that is whether the chunk has the word "repeat" or not. For building the classification model, I calculated the MFCC features and build a sequence model on the top of it. Nothing seems to work for me.
If anyone already worked with this kind of task, please provide me with a correct method/resources to build a DL model for this task. Thanks in advance!

Anomaly Detection with Autoencoder using unlabelled Dataset (How to construct the input data)

I am new in deep learning field, i would like to ask about unlabeled dataset for Anomaly Detection using Autoencoder. my confusing part start at a few questions below:
1) some post are saying separated anomaly and non-anomaly (assume is labelled) from the original dataset, and train AE with the only non-anomaly dataset (usually amount of non-anomaly will be more dominant). So, the question is how am I gonna separate my dataset if it is unlabeled?
2) if I train using the original unlabeled dataset, how to detect the anomaly data?
Label of data doesn't go into autoencoder.
Auto Encoder consists of two parts
Encoder and Decoder
Encoder: It encodes the input data, say a sample with 784 features to 50 features
Decoder: from those 50 features it converts it back to original feature i.e 784 features.
Now to detect anomaly,
if you pass an unknown sample, it should be converted back to its original sample without much loss.
But if there is a lot of error in converting it back. then it could be an anomaly.
Picture Credit: towardsdatascience.com
I think you answered the question already yourself in part: The definition of an anomaly is that it should be considered "a rare event". So even if you don't know the labels, your training data will contain only very few such samples and predominantly learn on what the data usually looks like. So both during training as well as at prediction time, your error will be large for an anomaly. But since such examples should come up only very seldom, this will not influence your embedding much.
In the end, if you can really justify that the anomaly you are checking for is rare, you might not need much pre-processing or labelling. If it occurs more often (a threshold is hard to give for that, but I'd say it should be <<1%), your AE might pick up on that signal and you would really have to get the labels in order to split the data... . But then again: This would not be an anomaly any more, right? Then you could go ahead and train a (balanced) classifier with this data.

What exactly is timestep in an LSTM Model?

I am a newbie to LSTM and RNN as a whole, I've been racking my brain to understand what exactly is a timestep. I would really appreciate an intuitive explanation to this
Let's start with a great image from Chris Olah's blog (a highly recommended read btw):
In a recurrent neural network you have multiple repetitions of the same cell. The way inference goes is - you take some input (x0), pass it through the cell to get some output1(depicted with black arrow to the right on the picture), then pass output1 as input(possibly adding some more input components - x1 on the image) to the same cell, producing new output output2, pass that again as input to the same cell(again with possibly additional input component x2), producing output3 and so on.
A time step is a single occurrence of the cell - e.g. on the first time step you produce output1, h0, on the second time step you produce output2 and so on.

Machine Learning for gesture recognition with Myo Armband

I'm trying to develop a model to recognize new gestures with the Myo Armband. (It's an armband that possesses 8 electrical sensors and can recognize 5 hand gestures). I'd like to record the sensors' raw data for a new gesture and feed it to a model so it can recognize it.
I'm new to machine/deep learning and I'm using CNTK. I'm wondering what would be the best way to do it.
I'm struggling to understand how to create the trainer. The input data looks like something like that I'm thinking about using 20 sets of these 8 values (they're between -127 and 127). So one label is the output of 20 sets of values.
I don't really know how to do that, I've seen tutorials where images are linked with their label but it's not the same idea. And even after the training is done, how can I avoid the model to recognize this one gesture whatever I do since it's the only one it's been trained for.
An easy way to get you started would be to create 161 columns (8 columns for each of the 20 time steps + the designated label). You would rearrange the columns like
emg1_t01, emg2_t01, emg3_t01, ..., emg8_t20, gesture_id
This will give you the right 2D format to use different algorithms in sklearn as well as a feed forward neural network in CNTK. You would use the first 160 columns to predict the 161th one.
Once you have that working you can model your data to better represent the natural time series order it contains. You would move away from a 2D shape and instead create a 3D array to represent your data.
The first axis shows the number of samples
The second axis shows the number of time steps (20)
The thirst axis shows the number of sensors (8)
With this shape you're all set to use a 1D convolutional model (CNN) in CNTK that traverses the time axis to learn local patterns from one step to the next.
You might also want to look into RNNs which are often used to work with time series data. However, RNNs are sometimes hard to train and a recent paper suggests that CNNs should be the natural starting point to work with sequence data.