I know how to merge different models into one in Keras.
first_model = Sequential()
first_model.add(LSTM(output_dim, input_shape=(m, input_dim)))
second_model = Sequential()
second_model.add(LSTM(output_dim, input_shape=(n-m, input_dim)))
model = Sequential()
model.add(Merge([first_model, second_model], mode='concat'))
model.fit([X1,X2])
I am not sure how to do this in TensorFlow though.
I have two LSTM models and want to merge those (in the same way as in above Keras example).
outputs_1, state_1 = tf.nn.dynamic_rnn(stacked_lstm_1, model_input_1)
outputs_2, state_2 = tf.nn.dynamic_rnn(stacked_lstm_2, model_input_2)
Any help would be much appreciated!
As was said in the comment, I believe the simplest way to do this is just to concatenate the outputs. The only complication that I've found is that, at least how I made my LSTM layers, they ended up with the exact same names for their weight tensors. This led to an error because TensorFlow thought the weights were already made when I tried to make the second layer. If you have this problem, you can solve it using a variable scope, which will apply to the names of the tensors in that LSTM layer:
with tf.variable_scope("LSTM_1"):
lstm_cells_1 = tf.contrib.rnn.MultiRNNCell(tf.contrib.rnn.LSTMCell(256))
output_1, state_1 = tf.nn.dynamic_rnn(lstm_cells_1, inputs_1)
last_output_1 = output_1[:, -1, :]
# I usually work with the last one; you can keep them all, if you want
with tf.variable_scope("LSTM_2"):
lstm_cells_2 = tf.contrib.rnn.MultiRNNCell(tf.contrib.rnn.LSTMCell(256))
output_2, state_2 = tf.nn.dynamic_rnn(lstm_cells_2, inputs_2)
last_output_2 = output_2[:, -1, :]
merged = tf.concat((last_output_1, last_output_2), axis=1)
Related
I'm currently learning how to use pytorch to model NNs and did the "Getting Started" Session on the PyTorch Website.
I tried to train a PyTorch NN to apply the function e.g. f(x)=2x-1 to a given input integer list but my model is far apart from learning the right thing.
How can I model and train a PyTorch model to learn a given mathematical function f(x) ?
I've tried this model and trained it with 10 random numbers with labels generated by the 'myFunc' function to learn the function 2x-1.
Thanks for your help.
batch_size = 10
def myFunc(a):
#y = 2x-1
return 2*a-1
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.lin1 = nn.Linear(batch_size,1)
self.lin2 = nn.Linear(1,batch_size)
def forward(self, x):
x = self.lin1(x)
x = F.relu(x)
x = self.lin2(x)
return x
model = NeuralNetwork()
Theoretically for your example of an affine-linear function over a bounded interval you need only
linear(bias) -> relu -> linear(bias)
with one node per linear layer. Or just one linear layer without activation.
For more general functions, you will need larger layers in the construction of the first type, with one node for every piece in a piece-wise approximation. The last layer always needs to be linear without activation. Using more layers might give more pieces with less total nodes.
I am trying to build a LSTM Autoendoer for anomaly detection.
But the model seems not work for my data.
Here is the normal data that I use it for training.
And here is abnormal data that I use it for validation.
If model works, its loss should be high at #200000~#500000.
Unfortunately, here is the result I put the valid data to the model:
In the abnormal interval, the loss still low.
Here is my code of training model.
I would greatly appreciate it if you kindly give me any suggestions.
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(healthy_data)
data_scaled = scaler.transform(healthy_data)
data_broken_scaled = scaler.transform(broken_data)
timesteps=32
data = data_scaled
dim = 1
data.shape = (-1,timesteps,dim)
lr = 0.0001
Nadam = optimizers.Nadam(lr=lr)
model = Sequential()
model.add(LSTM(50,input_shape=(timesteps,dim),return_sequences=True))
model.add(Dense(dim))
model.compile(loss='mae', optimizer=Nadam ,metrics=['mse'])
EStop = EarlyStopping(monitor='val_loss', min_delta=0.001,patience=150, verbose=2, mode='auto',restore_best_weights=True)
history = model.fit(data,data,validation_data=(data,data),epochs=3000,batch_size=72,verbose=2,shuffle=False,callbacks=[EStop]).history
pred_broken = model.predict(data_broken_scaled)
loss_broken = np.mean(np.abs(pred_broken-data_broken_scaled),axis=1)
fig, ax = plt.subplots(figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
ax.plot(range(0,len(loss_broken)), loss_broken, '-', color='red', animated = True, linewidth=1)
I Think, it may better to use Fourier transform to detect anomaly in frequency view.
I means, the train and test data will be converted to frequency domain via Fourier transform. Also I recommend to use Time-windows.
Most AI will act better if they have enough good train data. Your anomaly points should be labeled to 1(by pre-processing step of the data) and it should have high frequency at your time-step(time-windows).
In short, I think your train data may not sufficiently representative for anomaly now.
run_meta = tf.RunMetadata()
enter codwith tf.Session(graph=tf.Graph()) as sess:
K.set_session(sess)
with tf.device('/cpu:0'):
base_model = MobileNet(alpha=1, weights=None, input_tensor=tf.placeholder('float32', shape=(1,224,224,3)))
opts = tf.profiler.ProfileOptionBuilder.float_operation()
flops = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)
opts = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()
params = tf.profiler.profile(sess.graph, run_meta=run_meta, cmd='op', options=opts)
print("{:,} --- {:,}".format(flops.total_float_ops, params.total_parameters))
When I run above code, I got a below result
1,137,481,704 --- 4,253,864
This is different from the flops described in the paper.
mobilenet: https://arxiv.org/pdf/1704.04861.pdf
ShuffleNet: https://arxiv.org/pdf/1707.01083.pdf
How to calculate exact flops described in the paper?
tl;dr You've actually got the right answer! You are simply comparing flops with multiply accumulates (from the paper) and therefore need to divide by two.
If you're using Keras, then the code you listed is slightly over-complicating things...
Let model be any compiled Keras model. We can arrive at the flops of the model with the following code.
import tensorflow as tf
import keras.backend as K
def get_flops():
run_meta = tf.RunMetadata()
opts = tf.profiler.ProfileOptionBuilder.float_operation()
# We use the Keras session graph in the call to the profiler.
flops = tf.profiler.profile(graph=K.get_session().graph,
run_meta=run_meta, cmd='op', options=opts)
return flops.total_float_ops # Prints the "flops" of the model.
# .... Define your model here ....
# You need to have compiled your model before calling this.
print(get_flops())
However, when I look at my own example (not Mobilenet) that I did on my computer, the printed out total_float_ops was 2115 and I had the following results when I simply printed the flops variable:
[...]
Mul 1.06k float_ops (100.00%, 49.98%)
Add 1.06k float_ops (50.02%, 49.93%)
Sub 2 float_ops (0.09%, 0.09%)
It's pretty clear that the total_float_ops property takes into consideration multiplication, addition and subtraction.
I then looked back at the MobileNets example, looking through the paper briefly, I found the implementation of MobileNet that is the default Keras implementation based on the number of parameters:
The first model in the table matches the result you have (4,253,864) and the Mult-Adds are approximately half of the flops result that you have. Therefore you have the correct answer, it's just you were mistaking flops for Mult-Adds (aka multiply accumulates or MACs).
If you want to compute the number of MACs you simply have to divide the result from the above code by two.
Important Notes
Keep the following in mind if you are trying to run the code sample:
The code sample was written in 2018 and doesn't work with tensorflow version 2. See #driedler 's answer for a complete example of tensorflow version 2 compatibility.
The code sample was originally meant to be run once on a compiled model... For a better example of using this in a way that does not have side effects (and can therefore be run multiple times on the same model), see #ch271828n 's answer.
This is working for me in TF-2.1:
def get_flops(model_h5_path):
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = tf.keras.models.load_model(model_h5_path)
run_meta = tf.compat.v1.RunMetadata()
opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()
# Optional: save printed results to file
# flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
# opts['output'] = 'file:outfile={}'.format(flops_log_path)
# We use the Keras session graph in the call to the profiler.
flops = tf.compat.v1.profiler.profile(graph=graph,
run_meta=run_meta, cmd='op', options=opts)
return flops.total_float_ops
The above solutions cannot be run twice, otherwise the flops will accumulate! (In other words, the second time you run it, you will get output = flops_of_1st_call + flops_of_2nd_call.) The following code calls reset_default_graph to avoid this.
def get_flops():
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = keras.applications.mobilenet.MobileNet(
alpha=1, weights=None, input_tensor=tf.compat.v1.placeholder('float32', shape=(1, 224, 224, 3)))
run_meta = tf.compat.v1.RunMetadata()
opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()
# Optional: save printed results to file
# flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
# opts['output'] = 'file:outfile={}'.format(flops_log_path)
# We use the Keras session graph in the call to the profiler.
flops = tf.compat.v1.profiler.profile(graph=graph,
run_meta=run_meta, cmd='op', options=opts)
tf.compat.v1.reset_default_graph()
return flops.total_float_ops
Modified from #driedler, thanks!
You can use model.summary() on all Keras models to get number of FLOPS.
I am working on an images classification using Keras.
There is my model:
model = Sequential()
model.add(Conv2D(filters = 8, kernel_size = (3,3),padding = 'Same',
activation ='relu', input_shape = (64,64,3)))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(filters = 16, kernel_size = (3,3),padding = 'Same',
activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Conv2D(filters = 32, kernel_size = (3,3),padding = 'Same',
activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(3, activation = "sigmoid"))
I would like to avoid Flatten() since in this case we are losing some spatial information. I looked some tutorials, but all of them used Flatten(). Is it possible to use some thing like deconvolution instead?
Flatten is fine.
The "spatial" relations of Flatten going into a Dense layer are, in a sense preserved. As all values from a particular position map to the same weight in the dense layer. So every point is mapped consistently across the dataset. The "spatial" relations mapped in the convolutional layers are looking for localized patterns, and in those layers keeping the input unaltered is important.
I am trying to understand the catboost overfitting detector. It is described here:
https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-docpage/#overfitting-detector
Other gradient boosting packages like lightgbm and xgboost use a parameter called early_stopping_rounds, which is easy to understand (it stops the training once the validation error hasn't decreased in early_stopping_round steps).
However I have a hard time understanding the p_value approach used by catboost. Can anyone explain how this overfitting detector works and when it stops the training?
It's not documented on the Yandex website or at the github repository, but if you look carefully through the python code posted to github (specifically here), you will see that the overfitting detector is activated by setting "od_type" in the parameters. Reviewing the recent commits on github, the catboost developers also recently implemented a tool similar to the "early_stopping_rounds" parameter used by lightGBM and xgboost, called "Iter."
To set the number of rounds after the most recent best iteration to wait before stopping, provide a numeric value in the "od_wait" parameter.
For example:
fit_param <- list(
iterations = 500,
thread_count = 10,
loss_function = "Logloss",
depth = 6,
learning_rate = 0.03,
od_type = "Iter",
od_wait = 100
)
I am using the catboost library with R 3.4.1. I have found that setting the "od_type" and "od_wait" parameters in the fit_param list works well for my purposes.
I realize this is not answering your question about the way to use the p_value approach also implemented by the catboost developers; unfortunately I cannot help you there. Hopefully someone else can explain that setting to the both of us.
Catboost now supports early_stopping_rounds: fit method parameters
Sets the overfitting detector type to Iter and stops the training
after the specified number of iterations since the iteration with the
optimal metric value.
This works very much like early_stopping_rounds in xgboost.
Here is an example:
from catboost import CatBoostRegressor, Pool
from sklearn.model_selection import train_test_split
import numpy as np
y = np.random.normal(0, 1, 1000)
X = np.random.normal(0, 1, (1000, 1))
X[:, 0] += y * 2
X_train, X_eval, y_train, y_eval = train_test_split(X, y, test_size=0.1)
train_pool = Pool(X, y)
eval_pool = Pool(X_eval, y_eval)
model = CatBoostRegressor(iterations=1000, learning_rate=0.1)
model.fit(X, y, eval_set=eval_pool, early_stopping_rounds=10)
The result should be something like this:
522: learn: 0.3994718 test: 0.4294720 best: 0.4292901 (514) total: 957ms remaining: 873ms
523: learn: 0.3994580 test: 0.4294614 best: 0.4292901 (514) total: 958ms remaining: 870ms
524: learn: 0.3994495 test: 0.4294806 best: 0.4292901 (514) total: 959ms remaining: 867ms
Stopped by overfitting detector (10 iterations wait)
bestTest = 0.4292900745
bestIteration = 514
Shrink model to first 515 iterations.
early_stopping_rounds takes into account both od_type='Iter' and od_wait parameters. No need to individually set both od_type and od_wait, just set early_stopping_rounds parameter.