I have trained a network using MXnet, but am not sure how I can save and load the parameters for later use. First I define and train the network:
dataIn = mx.sym.var('data')
fc1 = mx.symbol.FullyConnected(data=dataIn, num_hidden=100)
act1 = mx.sym.Activation(data=fc1, act_type="relu")
fc2 = mx.symbol.FullyConnected(data=act1, num_hidden=50)
act2 = mx.sym.Activation(data=fc2, act_type="relu")
fc3 = mx.symbol.FullyConnected(data=act2, num_hidden=25)
act3 = mx.sym.Activation(data=fc3, act_type="relu")
fc4 = mx.symbol.FullyConnected(data=act3, num_hidden=10)
act4 = mx.sym.Activation(data=fc4, act_type="relu")
fc5 = mx.symbol.FullyConnected(data=act4, num_hidden=2)
lenet = mx.sym.SoftmaxOutput(data=fc5, name='softmax',normalization = 'batch')
# create iterator around training and validation data
train_iter = mx.io.NDArrayIter(data=data[:ntrain], label = phen[:ntrain],batch_size=batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(data=data[ntrain:], label=phen[ntrain:], batch_size=batch_size)
# create a trainable module on GPU 0
lenet_model = mx.mod.Module(symbol=lenet, context=mx.gpu())
# train with the same
lenet_model.fit(train_iter,
eval_data=val_iter,
optimizer='adam',
optimizer_params={'learning_rate':0.00001},
eval_metric='f1',
batch_end_callback = mx.callback.Speedometer(batch_size, 10),
num_epoch=1000)
This model performs well on the test set, so I want to keep it. Next, I save the network layout and the parameterization:
lenet.save('./testNet_symbol.mxnet')
lenet_model.save_params('./testNet_module.mxnet')
All the documentation I can find on loading the network seem to have implemented the save function within the training routine, to save the network parameters at the end of each epoch. I haven't set these checkpoints during the training process Other methods use the mx.model.FeedForward class, which doesn't seem appropriate. Still other methods load the network from a .json file, which I don't have as a result of my save functions. How can I save/load a network after it's already finished training?
You just have to do this instead to save:
lenet_model.save_checkpoint('lenet', num_epoch, save_optimizer_states=True)
This would create 3 files if the states flag is set to True else 2 files:
.params (weights),
.json (symbol),
.states
And this to load:
lenet_model = mx.mod.Module.load(prefix,epoch)
lenet_model.bind(for_training=False, data_shapes=[('data', (1,3,224,224))])
Related
Here I share a code snippet for training Encoder_Decoder Model for machine translation. While Using the Embedding layer (trained previously) during inference mode( on test_data) . It threw the following error --->
# Encoder
encoder_inputs = Input(shape=(None ,))
enc_emb = Embedding(eng_vocab_size, latent_dim, mask_zero = True)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(enc_emb)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None,))
dec_emb = Embedding(deu_vocab_size, latent_dim, mask_zero = True)(decoder_inputs)
# decoder return full output sequences, and internal states as well.
# We don't use the return states in the training model,
# but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb,
initial_state=encoder_states)
decoder_dense = Dense(deu_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# Compile the model
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
# Encode the input sequence to get the "thought vectors"
encoder_model = Model(encoder_inputs, encoder_states)
# Decoder setup
# Below tensors will hold the states of the previous time step
decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
dec_emb2= dec_emb(decoder_inputs) # reusing embedding layer
decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=decoder_states_inputs) # reusing lstm layer
decoder_outputs2 = decoder_dense(decoder_outputs2) # softmax_layer to generate prob_dist. over target vocab
# Final decoder model
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs2] )
ERROR
8 decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
---> 10 dec_emb2= dec_emb(decoder_inputs) # reusing embedding layer
11
12 decoder_outputs2, state_h2, state_c2 = decoder_lstm(dec_emb2, initial_state=decoder_states_inputs) # reusing lstm layer
TypeError: 'KerasTensor' object is not callableenter image description here
I read through various solutions available for this issue , but couldn't understand what 2 modes of model they were talking about and what their soltion was effectively doing .
Pls explain in detail. Thanks in advance
I am trying to convert pytorch model with multiple networks to ONNX, and encounter some problem.
The git repo: https://github.com/InterDigitalInc/HRFAE
The Trainer Class:
class Trainer(nn.Module):
def __init__(self, config):
super(Trainer, self).__init__()
# Load Hyperparameters
self.config = config
# Networks
self.enc = Encoder()
self.dec = Decoder()
self.mlp_style = Mod_Net()
self.dis = Dis_PatchGAN()
...
Here is how the trained model process image:
def gen_encode(self, x_a, age_a, age_b=0, training=False, target_age=0):
if target_age:
self.target_age = target_age
age_modif = self.target_age*torch.ones(age_a.size()).type_as(age_a)
else:
age_modif = self.random_age(age_a, diff_val=25)
# Generate modified image
self.content_code_a, skip_1, skip_2 = self.enc(x_a)
style_params_a = self.mlp_style(age_a)
style_params_b = self.mlp_style(age_modif)
x_a_recon = self.dec(self.content_code_a, style_params_a, skip_1, skip_2)
x_a_modif = self.dec(self.content_code_a, style_params_b, skip_1, skip_2)
return x_a_recon, x_a_modif, age_modif
And as following is how I did to convert to onnx:
enc = Encoder()
dec = Decoder()
mlp = Mod_Net()
layers = [enc, mlp, dec]
model = torch.nn.Sequential(*layers)
# here is my confusion: how do I specify the inputs of each layer??
# E.g. one of the outputs of 'enc' layer should be input of 'mlp' layer,
# or the outputs of 'enc' layer should be part of inputs of 'dec' layer...
params = torch.load('./logs/001/checkpoint')
model[0].load_state_dict(params['enc_state_dict'])
model[1].load_state_dict(params['mlp_style_state_dict'])
model[2].load_state_dict(params['dec_state_dict'])
torch.onnx.export(model, torch.randn([1, 3, 1024, 1024]), 'trained_hrfae.onnx', do_constant_folding=True)
Maybe the convert-part code is in wrong way??
Could anyone help, many thanks!
#20210629-11:52GMT Edit:
I found there's constraint of using torch.nn.Sequential. The output of former layer in Sequential should be consistent with latter input.
So my code shouldn't work at all because the output of 'enc' layer is not consistent with input of 'mlp' layer.
Could anyone help how to convert this type of pytorch model to onnx? Many thanks, again :)
After research and try, I found a method which maybe in correct way:
Convert each net(Encoder, Mod_Net, Decoder) to onnx model, and handle their input/output in latter logic-process or any further procedure (e.g convert to tflite model).
I'm trying to port onto Android using this method.
#Edit 20210705-03:52GMT#
Another approach may be better: write a new net combines the three nets. I've prove the output is same as origin pytorch model.
class HRFAE(nn.Module):
def __init__(self):
super(HRFAE, self).__init__()
self.enc = Encoder()
self.mlp_style = Mod_Net()
self.dec = Decoder()
def forward(self, x, age_modif):
content_code_a, skip_1, skip_2 = self.enc(x)
style_params_b = self.mlp_style(age_modif)
x_a_modif = self.dec(content_code_a, style_params_b, skip_1, skip_2)
return x_a_modif
and then convert use following:
net = HRFAE()
params = torch.load('./logs/002/checkpoint')
net.enc.load_state_dict(params['enc_state_dict'])
net.mlp_style.load_state_dict(params['mlp_style_state_dict'])
net.dec.load_state_dict(params['dec_state_dict'])
net.eval()
torch.onnx.export(net, (torch.randn([1, 3, 512, 512]), torch.randn([1]).type(torch.long)), 'test_hrfae.onnx')
This should be the answer.
I have tried to save and load the model using:
All keys are mapped but there is no prediction in output
#1
from detectron2.modeling import build_model
model = build_model(cfg)
torch.save(model.state_dict(), 'checkpoint.pth')
model.load_state_dict(torch.load(checkpoint_path,map_location='cpu'))
I also tried doing it using the official doc but can't understand the input format part
from detectron2.checkpoint import DetectionCheckpointer
DetectionCheckpointer(model).load(file_path_or_url) # load a file, usually from cfg.MODEL.WEIGHTS
checkpointer = DetectionCheckpointer(model, save_dir="output")
checkpointer.save("model_999") # save to output/model_999.pth
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file('COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml'))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # Set threshold for this model
cfg.MODEL.WEIGHTS = '/content/model_final.pth' # Set path model .pth
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1
predictor = DefaultPredictor(cfg)
My code to load custom model works.
I am trying to implement q-learning with an action-value approximation-function. I am using openai-gym and the "MountainCar-v0" enviroment to test my algorithm out. My problem is, it does not converge or find the goal at all.
Basically the approximator works like the following, you feed in the 2 features: position and velocity and one of the 3 actions in a one-hot encoding: 0 -> [1,0,0], 1 -> [0,1,0] and 2 -> [0,0,1]. The output is the action-value approximation Q_approx(s,a), for one specific action.
I know that usually, the input is the state (2 features) and the output layer contains 1 output for each action. The big difference that I see is that I have run the feed forward pass 3 times (one for each action) and take the max, while in the standard implementation you run it once and take the max over the output.
Maybe my implementation is just completely wrong and I am thinking wrong. Gonna paste the code here, it is a mess but I am just experimenting a bit:
import gym
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation
env = gym.make('MountainCar-v0')
# The mean reward over 20 episodes
mean_rewards = np.zeros(20)
# Feature numpy holder
features = np.zeros(5)
# Q_a value holder
qa_vals = np.zeros(3)
one_hot = {
0 : np.asarray([1,0,0]),
1 : np.asarray([0,1,0]),
2 : np.asarray([0,0,1])
}
model = Sequential()
model.add(Dense(20, activation="relu",input_dim=(5)))
model.add(Dense(10,activation="relu"))
model.add(Dense(1))
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
epsilon_greedy = 0.1
discount = 0.9
batch_size = 16
# Experience replay containing features and target
experience = np.ones((10*300,5+1))
# Ring buffer
def add_exp(features,target,index):
if index % experience.shape[0] == 0:
index = 0
global filled_once
filled_once = True
experience[index,0:5] = features
experience[index,5] = target
index += 1
return index
for e in range(0,100000):
obs = env.reset()
old_obs = None
new_obs = obs
rewards = 0
loss = 0
for i in range(0,300):
if old_obs is not None:
# Find q_a max for s_(t+1)
features[0:2] = new_obs
for i,pa in enumerate([0,1,2]):
features[2:5] = one_hot[pa]
qa_vals[i] = model.predict(features.reshape(-1,5))
rewards += reward
target = reward + discount*np.max(qa_vals)
features[0:2] = old_obs
features[2:5] = one_hot[a]
fill_index = add_exp(features,target,fill_index)
# Find new action
if np.random.random() < epsilon_greedy:
a = env.action_space.sample()
else:
a = np.argmax(qa_vals)
else:
a = env.action_space.sample()
obs, reward, done, info = env.step(a)
old_obs = new_obs
new_obs = obs
if done:
break
if filled_once:
samples_ids = np.random.choice(experience.shape[0],batch_size)
loss += model.train_on_batch(experience[samples_ids,0:5],experience[samples_ids,5].reshape(-1))[0]
mean_rewards[e%20] = rewards
print("e = {} and loss = {}".format(e,loss))
if e % 50 == 0:
print("e = {} and mean = {}".format(e,mean_rewards.mean()))
Thanks in advance!
There shouldn't be much difference between the actions as inputs to your network or as different outputs of your network. It does make a huge difference if your states are images for example. because Conv nets work very well with images and there would be no obvious way of integrating the actions to the input.
Have you tried the cartpole balancing environment? It is better to test if your model is working correctly.
Mountain climb is pretty hard. It has no reward until you reach the top, which often doesn't happen at all. The model will only start learning something useful once you get to the top once. If you are never getting to the top you should probably increase your time doing exploration. in other words take more random actions, a lot more...
I am trying to load several models in pycaffe so I can perform ensemble learning. I can load 1 model fine and perform tests on it, but when I try to load my second model it just stops.
I use the following code:
model_def_net0 = caffe_root + 'Dataset/net0/deploy.prototxt'
model_weights_net0 = caffe_root + 'Dataset/net0/net0_train_vgg_iter_250000.caffemodel'
net0 = caffe.Net(model_def_net0, # defines the structure of the model
model_weights_net0, # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
model_def_caffe = caffe_root + 'Dataset/caffenet/deploy_3.prototxt'
model_weights_caffe = caffe_root + 'Dataset/caffenet/caffenet_train_iter_150000.caffemodel'
net1 = caffe.Net(model_def_caffe, # defines the structure of the model
model_weights_caffe, # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
When I execute the code it will halt when trying to load net1 with the following in the log:
I0516 18:29:39.718916 7339 net.cpp:411] data -> data
I0516 18:29:39.718942 7339 net.cpp:411] data -> label