So I am involved in a project that involves feeding a combination of text embeddings and image vectors into a DNN to arrive at the result. Now for the word embedding part, I am using TFHUB's Electra while for the image part I am using a NASNet Mobile network.
However, the issue I am facing is that while running the word embedding part, using the code shown below, the code just keeps running nonstop. It has been over 2 hours now and my training dataset has just 14900 rows of tweets.
Note - The input to the function is just a list of 14900 tweets.
tfhub_handle_encoder="https://tfhub.dev/google/electra_small/2"
tfhub_handle_preprocess="https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3
# Load Models
bert_preprocess_model = hub.KerasLayer(tfhub_handle_preprocess)
bert_model = hub.KerasLayer(tfhub_handle_encoder)
def get_text_embedding(text):
preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='Preprocessor')
encoder_inputs = preprocessing_layer(text) encoder =
hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='Embeddings') outputs =
encoder(encoder_inputs) text_repr = outputs['pooled_output'] text_repr =
tf.keras.layers.Dense(128, activation='relu')(text_repr)
return text_repr
text_repr = get_text_embedding(train_text)
Is there a faster way to get text representation using these models?
Thanks for the help!
The operation performed in the code is quadratic in its nature. While I managed to execute your snippet with 10000 samples within a few minutes, a 14900 long input ran out of memory on 32GB RAM runtime. Is it possible that your runtime is experiencing swapping?
It is not clear what is the snippet trying to achieve. Do you intend to train model? In such case you can define the text_input as an Input layer and use fit to train. Here is an example: https://www.tensorflow.org/text/tutorials/classify_text_with_bert#define_your_model
Related
Case 1:
I am feeding a variable-length input time-series window to the GRU model. Sometimes there may be 900 samples in the window, and sometimes there may be only 16. I fed into the RNN model (GRU) since I learned that RNN methods work better on long sequences. I utilize one GRU layer and get hidden sequences across all the time stamps in order to get maximum information of all the time stamps. Then, I used average pooling on GRU output to bring representation into fixed-length. The intuition of using average-pooling instead of max-pooling is that it may achieve summarized information of all the timestamps. Here is the code of the model:
input_layer = tf.keras.Input(shape=input_shape, name="time_series_activity")
input_mask = tf.keras.layers.Masking(mask_value=0.00000)(input_layer)
gru_l5 = tf.keras.layers.GRU(64, activation='tanh', recurrent_activation='sigmoid',
recurrent_initializer=tf.keras.initializers.Orthogonal(), dropout=0.5, recurrent_dropout=0.5, return_sequences=True
)(input_mask)
AP = tf.keras.layers.GlobalAveragePooling1D()(gru_l5)
gru_fm = tf.keras.layers.Dropout(0.3)(AP)
output_layer = tf.keras.layers.Dense(total_classes, activation="softmax")(gru_fm)
return tf.keras.models.Model(inputs=input_layer, outputs=output_layer)
From this model, I am obtaining better performance on validation set while on training data, performance increased by 100% (going for worst), however, the major issue is that validation loss is "nan." This issue is currently being explored on GitHub and StackOverflow.
I tried nearly all of the options provided here, here and here. But unable to resolve this validation_loss = non issue.
Case 2:
Then I decided not to get all of the GRU's hidden states but rather to retrieve only the last hidden state, which would provide a fixed-length representation and eliminate the requirement for pooling. Here, the validation loss as "nan" probelm is fixed, but the test data performance is drastically reduced. Here is this model's source code:
input_layer = tf.keras.Input(shape=input_shape, name="time_series_activity")
input_mask = tf.keras.layers.Masking(mask_value=0.00000)(input_layer)
gru_l5 = tf.keras.layers.GRU(64, activation='tanh', recurrent_activation='sigmoid',
recurrent_initializer=tf.keras.initializers.Orthogonal(), dropout=0.5, recurrent_dropout=0.5)(input_mask)
gru_fm = tf.keras.layers.Dropout(0.3)(gru_l5)
output_layer = tf.keras.layers.Dense(total_classes, activation="softmax")(gru_fm)
return tf.keras.models.Model(inputs=input_layer, outputs=output_layer)
We can observe the results of both Cases. In Case 1, I have the feeling that the vanishing gradient problem occurs with longer sequences. Any thoughts or discussions on resolving this "nan" issue and achieving high performance would be much appreciated.
I'm studying on a deep learning(supervised-learning) to estimate depth images from monocular images.
And the dataset currently uses KITTI data. RGB images (input image) are used KITTI Raw data, and data from the following link is used for ground-truth.
In the process of learning a model by designing a simple encoder-decoder network, the result is not so good, so various attempts are being made.
While searching for various methods, I found that groundtruth only learns valid areas by masking because there are many invalid areas, i.e., values that cannot be used, as shown in the image below.
So, I learned through masking, but I am curious about why this result keeps coming out.
and this is my training part of code.
How can i fix this problem.
for epoch in range(num_epoch):
model.train() ### train ###
for batch_idx, samples in enumerate(tqdm(train_loader)):
x_train = samples['RGB'].to(device)
y_train = samples['groundtruth'].to(device)
pred_depth = model.forward(x_train)
valid_mask = y_train != 0 #### Here is masking
valid_gt_depth = y_train[valid_mask]
valid_pred_depth = pred_depth[valid_mask]
loss = loss_RMSE(valid_pred_depth, valid_gt_depth)
As far as I can understand, you are trying to estimate depth from an RGB image as input. This is an ill-posed problem since the same input image can project to multiple plausible depth values. You would need to integrate certain techniques to estimate accurate depth from RGB images instead of simply taking an L1 or L2 loss between an RGB image and its corresponding depth image.
I would suggest you to go through some papers in estimating depth from single images such as: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network where they use a network to first estimate the global structure of the given image and then use a second network that refines the local scene information. Instead of taking a simple RMSE loss, as you did, they use a scale-invariant error function in which the relationship between points is measured.
I'm new in Deep Learning and I started with the TenserFlow tutorials (The beginner one and the expert one).In both of them, the data is imported at the beginning with these 2 lines :
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
I would like to use this neural network on my own images. I have 100 000 images et a fileLabel.txt giving the labels for each image in order by column. Is there a way to change these two lines or a few others to import my images without breaking all the code ? I really don't see how to do that, I have the impression that the structure mnist is specific to the images of the tutorial.
Thanks in advance for your help
Short answer to your question is yes - its possible. You don't need to break any code IF your data is also similar to MNIST data with 10 labels and well organized.
Assuming that is not the case, then you need to organize your input data so that you can define (create) your model.
Organizing of your input data includes
Having consistent image sizes (for example MNIST is 28x28 pixel images)
Labeling of your images (for example MNIST has 10 labels - 0 to 9)
Finally how you intend to split your data (for example MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation).
Once you organize your input data, then you read your data by writing a small function like read_images that does something like
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
....
Then assuming you want to "label" ahead of time similar to MNIST data, you can store them in a file and read them in your program.
After that, you would have to populate the tf.train.string_input_producer() with a list of strings containing the filename and the label.
....
I am new to keras and despite reading the documentation and the examples folder in keras, I'm still struggling with how to fit everything together.
In particular, I want to start with a simple task: I have a sequence of tokens, where each token has exactly one label. I have a lot training data like this - practically infinite, as I can generate more (token, label) training pairs as needed.
I want to build a network to predict labels given tokens. The number of tokens must always be the same as the number of labels (one token = one label).
And I want this to be based on all surrounding tokens, say within the same line or sentence or window -- not just on the preceding tokens.
How far I got on my own:
created the training numpy vectors, where I converted each sentence into a token-vector and label-vector (of same length), using a token-to-int and label-to-int mappings
wrote a model using categorical_crossentropy and one LSTM layer, based on https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py.
Now struggling with:
All the input_dim and input_shape parameters... since each sentence has a different length (different number of tokens and labels in it), what should I put as input_dim for the input layer?
How to tell the network to use the entire token sentence for prediction, not just one token? How to predict a whole sequence of labels given a sequence of tokens, rather than just label based on previous tokens?
Does splitting the text into sentences or windows make any sense? Or can I just pass a vector for the entire text as a single sequence? What is a "sequence"?
What are "time slices" and "time steps"? The documentation keeps mentioning that and I have no idea how that relates to my problem. What is "time" in keras?
Basically I have trouble connecting the concepts from the documentation like "time" or "sequence" to my problem. Issues like Keras#40 didn't make me any wiser.
Pointing to relevant examples on the web or code samples would be much appreciated. Not looking for academic articles.
Thanks!
If you have sequences of different length you can either pad them or use a stateful RNN implementation in which the activations are saved between batches. The former is the easiest and most used.
If you want to use future information when using RNNs you want to use a bidirectional model where you concatenate two RNN's moving in opposite directions. RNN will use a representation of all previous information when e.g. predicting.
If you have very long sentences it might be useful to sample a random sub-sequence and train on that. Fx 100 characters. This also helps with overfitting.
Time steps are your tokens. A sentence is a sequence of characters/tokens.
I've written an example of how I understand your problem but it's not tested so it might not run. Instead of using integers to represent your data I suggest one-hot encoding if it is possible and then use binary_crossentropy instead of mse.
from keras.models import Model
from keras.layers import Input, LSTM, TimeDistributed
from keras.preprocessing import sequence
# Make sure all sequences are of same length
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
# The input shape is your sequence length and your token embedding size (which is 1)
inputs = Input(shape=(maxlen, 1))
# Build a bidirectional RNN
lstm_forward = LSTM(128)(inputs)
lstm_backward = LSTM(128, go_backwards=True)(inputs)
bidirectional_lstm = merge([lstm_forward, lstm_backward], mode='concat', concat_axis=2)
# Output each timestep into a fully connected layer with linear
# output to map to an integer
sequence_output = TimeDistributed(Dense(1, activation='linear'))(bidirectional_lstm)
# Dense(n_classes, activation='sigmoid') if you want to classify
model = Model(inputs, sequence_output)
model.compile('adam', 'mse')
model.fit(X_train, y_train)
I am wondering how to go about setting up ONLY a test phase in Caffe for an LMDB file. I have already trained my model, everything seems good, my loss has decreased, and the output I am getting on images loaded in one by one also seem good.
Now I would like to see how my model performs on a separate LMDB test set, but seem to be unable to do so successfully. It would not be ideal for me to do a loop by loading images one at a time since my loss function is already defined in caffe and this would require me to redefine it.
this is what I have so far, but the results of this dont make sense; when I compare the loss I have from the train set to the loss I get from this, they don't match (orders of magnitude apart). Does anyone have any idea what my problem could be?
caffe.set_device(0)
caffe.set_mode_gpu()
net = caffe.Net('/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/deploy_JP.prototxt','/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/snapshot_iter_10000.caffemodel',caffe.TEST)
solver = None # ignore this workaround for lmdb data (can't instantiate two solvers on the same data)
solver = caffe.SGDSolver('/home/jeremy/Desktop/caffestuff/JP_Kitti/all_proto/mirror_shuffle/lenet_auto_solverJP_test.prototxt')
niter = 100
test_loss = zeros(niter)
count = 0
for it in range(niter):
solver.test_nets[0].forward() # SGD by Caffe
# store the test loss
test_loss[count] = solver.test_nets[0].blobs['loss']
print(solver.test_nets[0].blobs['loss'].data)
count = count+1
See my answer here. Do not forget to subtract the mean, otherwise you'll get low accuracy. The link to the code, posted above, takes care of that.