As I learn more and more about ML (I am a mobile DEV) I'm starting to form an analogy in my head. I would like the communities opinion / validation.
As a front end DEV you have a backend and an API that you can make requests to. The standard format for the inputs and outputs to the API is JSON.
I'm running into a problem with ML Models that I am trying to use where I don't know how to read the expected input (API) and I don't know how to decode the expected output.
So far I my experience has been fragmented because some models say "Give me an image of [1,2,120,120]" or something like that.
To analogize, is there a unified way to define inputs and outputs for a ML model like JSON unifies the inputs and outputs for an backend API?
If so, what are some rules one must follow to encode and decode data into this format?
Assuming this "ML Model" is in the context of running an input through say a trained pytorch model's forward pass to get an output, the unified way to define inputs and outputs for an ML model are through Tensors. Tensors are essentially a multi-dimensional matrix containing elements of a single data type. Think multi-dimensional lists with a single data type.
Tensors:MLModels::JSON:WebAPI
An Example using an Object Detector
Model
Let's say your model example with the image is an object detector model that takes in an image as input and outputs either dog or cat
The input would usually be:
A tensor representation of an Image with the shape of [1, 2, 120, 120] where 1 represents the batch size, 2 is the dimension of your rgb channels, and 120x120 is the width and height of an image.
The output would usually be:
A normalized 2 dimensional tensor like [0.7, 0.3] where index 0 represents the probability of the image depicting a dog and index 1 represents the probability it's a cat.
Encoding and Decoding
Decoding the output to a string like "dog" or "cat" is obvious.
Encoding an image is slightly less obvious. At its heart, the format
of an image is that of a tensor...a multi-dimensional matrix
containing a single datatype. So is still intuitive to encode an
image in the form of a JPEG or PNG to a tensor representation through
the rgb channel dimensions and the pixel values for each channel.
Typically image files are loaded in using libraries and methods like
the python imaging library and pytorch's
torchvision.transforms.ToTensor().
This example is very specific to an object detector type model, but most supervised ML models will output a tensor like the above or a one-hot label. Most ML models in general will always have data inputs and outputs that can be represented as Tensors.
Related
The idea of using BertTokenizer from huggingface really confuses me.
When I use
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
tokenizer.encode_plus("Hello")
Does the result is somewhat similar to when I pass
a one-hot vector representing "Hello" to a learning embedding matrix?
How is
BertTokenizer.from_pretrained("bert-base-uncased")
different from
BertTokenizer.from_pretrained("bert-**large**-uncased")
and other pretrained?
The encode_plus and encode functions tokenize your texts and prepare them in a proper input format of the BERT model. Therefore you can see them similar to the one-hot vector in your provided example.
The encode_plus returns a BatchEncoding consisting of input_ids, token_type_ids, and attention_mask.
The pre-trained model differs based on the number of encoder layers. The base model has 12 encoders, and the large model has 24 layers of encoders.
I'm quite new to PyTorch and I'm trying to build a net that is composed only of linear layers that will get a list of objects as input and output some score (which is a scalar) for each object. I'm wondering if my input tensor's dimensions should be (batch_size, list_size, object_size) or should I flatten each list and get (batch_size, list_size*object_size)? According to my understanding, in the first option I will have an output dimension of (batch_size, list_size, 1) and in the second (batch_size, list_size), does it matter? I read the documentation but it still wasn't very clear to me.
If you want to do the classification for each object in your input, you should keep the objects separate from each other; i.e., your input should be in the shape of (batch_size, list_size, object_size). Then considering the number of classes you got (let's say m classes), the linear layer would transform the input to the shape of (batch_size, list_size, m). In this case, you will have m scores for each object which can be utilized to predict the class label.
But question arises now; why do we flatten in neural networks at all? The answer is simple: because you want to couple the whole information (in your specific case, the information pieces are the objects) within a batch to see if they somehow affect each other, and if that's the case, to examine whether your network is able to learn these features/patterns. In practice, considering the nature of your problem and the data you are working with, if different objects really relate to each other, then your network will be able to learn those.
I am working on predicting Semantic Textual Similarity (SemEval 2017 Task-1) between a pair of texts. The similarity score (output) is a continuous value between [0,5]. The neural network model (link below), therefore, has 6 units in the final layer for prediction between values [0,5]. The objective function used is the Pearson correlation coefficient and softmax activation is used. Now, in order to train the model, how can I give the target output values to the model? Since there are 6 output classes, I should probably send one-hot-encoded vectors of the output. In that case, how can we convert the output (which might be a float value such as 2.33) to a one-hot vector of length 6? Or is there any other way of specifying the target output and training the model?
Paper: http://nlp.arizona.edu/SemEval-2017/pdf/SemEval016.pdf
If the value you're trying to predict is continuously-defined, you might be better off configuring this as a regression architecture. This will be simpler to train and interpret and will give you non-integer predictions (which you can then bucket or threshold however you please).
In order to do this, replace your softmax layer with a layer containing a single neuron with a linear activation function. Then you can simply train this network using your real-valued similarity numbers at the output. For loss function, you can use MSE / L2 unless you have a reason to do otherwise.
I'm totally new in caffe and I'm try to convert a tensorflow model to caffe.
I have a tuple which's shape is a little complex for it's stored some word vector.
This is the shape of the tuple data——
data[0]: a list, [684, 84], stores the sentence vector;
data[1]: a list, [684, 84], stores the position vector;
data[2]: a matrix, [684, 10], stores the aspects of the sentence;
data[3]: a matrix, [1, 684], stores the label of each sentence;
data[4]: a number, stores the max length of sentences;
Each row represents a sentences, which is also a sample of the dataset.
In tf, I return the whole tuple from a function which is wrote by myself.
train_data = read_data(FLAGS.train_data, source_count, source_word2idx)
I noticed that caffe always requires a data layer before training the data, but I don't have ideas how to convert my data to lmdb type or just sent them as a tuple or matrix into the model.
By the way, I'm using pycaffe.
Counld anyone help?
Thanks a lot!
There's no particular magic; all you need to do is to write an input routine that reads the file and returns the data in the format expected for train_data. You do not need to pre-convert your data to LMDB or any other format; just write read data to accept your current input format, and give the model the format it requires.
We can't help you from there: you haven't specified the model's format at all, and you've given us only the shape for the input data (no internal structure or semantics). Simply treat the data as if you were figuring out how to organize the input data for a given output format.
I am new to keras and despite reading the documentation and the examples folder in keras, I'm still struggling with how to fit everything together.
In particular, I want to start with a simple task: I have a sequence of tokens, where each token has exactly one label. I have a lot training data like this - practically infinite, as I can generate more (token, label) training pairs as needed.
I want to build a network to predict labels given tokens. The number of tokens must always be the same as the number of labels (one token = one label).
And I want this to be based on all surrounding tokens, say within the same line or sentence or window -- not just on the preceding tokens.
How far I got on my own:
created the training numpy vectors, where I converted each sentence into a token-vector and label-vector (of same length), using a token-to-int and label-to-int mappings
wrote a model using categorical_crossentropy and one LSTM layer, based on https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py.
Now struggling with:
All the input_dim and input_shape parameters... since each sentence has a different length (different number of tokens and labels in it), what should I put as input_dim for the input layer?
How to tell the network to use the entire token sentence for prediction, not just one token? How to predict a whole sequence of labels given a sequence of tokens, rather than just label based on previous tokens?
Does splitting the text into sentences or windows make any sense? Or can I just pass a vector for the entire text as a single sequence? What is a "sequence"?
What are "time slices" and "time steps"? The documentation keeps mentioning that and I have no idea how that relates to my problem. What is "time" in keras?
Basically I have trouble connecting the concepts from the documentation like "time" or "sequence" to my problem. Issues like Keras#40 didn't make me any wiser.
Pointing to relevant examples on the web or code samples would be much appreciated. Not looking for academic articles.
Thanks!
If you have sequences of different length you can either pad them or use a stateful RNN implementation in which the activations are saved between batches. The former is the easiest and most used.
If you want to use future information when using RNNs you want to use a bidirectional model where you concatenate two RNN's moving in opposite directions. RNN will use a representation of all previous information when e.g. predicting.
If you have very long sentences it might be useful to sample a random sub-sequence and train on that. Fx 100 characters. This also helps with overfitting.
Time steps are your tokens. A sentence is a sequence of characters/tokens.
I've written an example of how I understand your problem but it's not tested so it might not run. Instead of using integers to represent your data I suggest one-hot encoding if it is possible and then use binary_crossentropy instead of mse.
from keras.models import Model
from keras.layers import Input, LSTM, TimeDistributed
from keras.preprocessing import sequence
# Make sure all sequences are of same length
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
# The input shape is your sequence length and your token embedding size (which is 1)
inputs = Input(shape=(maxlen, 1))
# Build a bidirectional RNN
lstm_forward = LSTM(128)(inputs)
lstm_backward = LSTM(128, go_backwards=True)(inputs)
bidirectional_lstm = merge([lstm_forward, lstm_backward], mode='concat', concat_axis=2)
# Output each timestep into a fully connected layer with linear
# output to map to an integer
sequence_output = TimeDistributed(Dense(1, activation='linear'))(bidirectional_lstm)
# Dense(n_classes, activation='sigmoid') if you want to classify
model = Model(inputs, sequence_output)
model.compile('adam', 'mse')
model.fit(X_train, y_train)