Fluctuating RAM in google colab while running a BERT model - deep-learning

I am running a simple comment classification task on google colab. I am using DistilBERT for contextual embeddings.I use only 4000 training sample cause the notebook keeps on crashing.
When I run the cell for obtaining the embeddings, I keep a tab on how the RAM utilisation increases. I am seeing that it oscillates from somewhere between 3gb to 8gb.
Should not it be just increasing? Can anyone explain how this works at lower level.
Here is my code, the cell block at last is where I am seeing the above said thing.
# For DistilBERT:
model_class, tokenizer_class, pretrained_weights = (ppb.DistilBertModel, ppb.DistilBertTokenizer, 'distilbert-base-uncased')
## Want BERT instead of distilBERT? Uncomment the following line:
#model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased')
# Load pretrained model/tokenizer
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
max_len=80
tokenized = sample['comment_text'].apply((lambda x: tokenizer.encode(x, add_special_tokens=True,max_length= max_len)))
padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized.values])
attention_mask = np.where(padded != 0, 1, 0)
attention_mask.shape
input_ids = torch.tensor(padded)
attention_mask = torch.tensor(attention_mask)
**with torch.no_grad():
last_hidden_states = model(input_ids, attention_mask=attention_mask)**

Related

Not saving html interactive file with R

I am trying to design a circos plot using BioCircos R package. BioCircos allows to save the plots as .html interactive files. However, when I run the package using RScript the saved .html file is empty. To save the .html file I used saveWidget option from htmlwidgets package. Is it something wrong with saveWidget option? The code I used follows:
#!/usr/bin/Rscript
######R script for BioCircos test
library(htmlwidgets)
library(BioCircos)
genomes <- list("chra1" = 217471166, "chra2" = 181034961, "chra3" = 153873357, "chra4" = 153961319, "chra5" = 164033575,
"chra6" = 154486312, "chra7" = 133565930, "chra8" = 147241510, "chra9" = 91218944, "chra10" = 52432566, "chrb1" = 843366180, "chrb2" = 842558404, "chrb3" = 707956555, "chrb4" = 635713434, "chrb5" = 567300182,
"chrb6" = 439630435, "chrb7" = 236595445, "chrb8" = 231667822, "chrb9" = 230778867, "chrb10" = 151572763, "chrb11" = 103205957) # custom genome
links_chromosomes_01 <- c("chra1", "chra2", "chra3", "chra4", "chra4", "chra5", "chra6", "chra7", "chra7", "chra8", "chra8", "chra9", "chra10") # Chromosomes on which the links should start
links_chromosomes_02 <- c("chrb2", "chrb3", "chrb1", "chrb9", "chrb10", "chrb4", "chrb5", "chrb6", "chrb1", "chrb8", "chrb3", "chrb7", "chrb6") # Chromosomes on which the links should end
links_pos_01 <- c(115060347, 102611974, 14761160, 128700431, 128681496, 42116205, 58890582, 40356090,
146935315, 136481944, 157464876, 39323393, 84752508, 136164354,
99573657, 102580613,
111139346, 120764772, 90748238, 122164776,
44933176, 18823342,
48771409, 128288229, 150613881, 18509106, 123913217, 51237349,
34237851, 53357604, 78270031,
25306417, 25320614,
94266153,
41447919, 28810876, 2802465,
45583472,
81968637, 27858237, 17263637,
30569409) ### links chra chromosomes
links_pos_02 <- c(410543481, 463189512, 825903588, 353914638, 354135472, 717707494, 643107332, 724899652,
583713545, 558756961, 642015290, 154999098, 340216235, 557731577,
643350872, 655077847,
85356666, 157889318, 226411560, 161566470,
109857786, 25338955,
473876792, 124495704, 46258030, 572314729, 141584107, 426419779,
531245660, 220131772, 353941099,
62422773, 62387030,
116923325,
76544045, 33452274, 7942164,
642047816,
215981114, 39278129, 23302654,
418922633) ### links chrb chromosomes
links_labels <- c("aldh1a3", "amh", "cyp26b1", "dmrt1", "dmrt3", "fgf20", "hhip", "srd5a3",
"amhr2", "dhh", "fgf9", "nr0b1", "rspo1", "wnt1",
"aldh1a2", "cyp19a1",
"lhx9", "pdgfb", "ptch2", "sox10",
"cbln1", "wt1",
"esr1", "foxl2", "gata4", "lrpprc", "serpine2", "srd5a2",
"asns", "ctnnb1", "srd5a1",
"cyp26a1", "cyp26c1",
"wnt4",
"ar", "nr5a1", "ptgds",
"fgf16",
"cxcr4", "pdgfa", "sox8",
"sox9")
tracklist <- BioCircosLinkTrack('myLinkTrack', links_chromosomes_01, links_pos_01,
links_pos_01, links_chromosomes_02, links_pos_02, links_pos_02,
maxRadius = 0.55, labels = links_labels)
#plotting results
plot_chra_chrb <- BioCircos(tracklist, genome = chra_chrb_genomes, genomeFillColor = "RdBu", chrPad = 0.02, displayGenomeBorder = FALSE, genomeLabelTextSize = "10pt", genomeTicksScale = 4e+3,
elementId = "chra_chrb_comp_plot_test.html")
saveWidget(plot_chra_chrb, "chra_chrb_comp_plot_test.html", selfcontained = F, libdir = "lib")
The command line to run this script:
Rscript /path_to/Circle_plot_test.r
I tried to use this script in RStudio (without saveWidget() command), however it took too long to run in my personnel computer and the results was not displayed. However, this could be due to memory usage setup because when I took off some data, the script easily generates the plot in RStudio and I am able to save it. Is there other way to save the .hmtl interactive files in R or am I doing something wrong using htmlwidgets package in my script?
Thanks all in advance for any help and comments.
When you said it took too long to run, that was a sign that something was wrong! You weren't getting anything when you used saveWidget, because there is nothing returned from BioCiros.
I found two things that are a problem. The first one will result in a blank output—you can't use a '.' in the element ID. This ID will be used in the HTML coding.
You were getting huge delays due to the scale you set for genomeTickScale. That scaling value is for a tick mark attribute. I'm not sure why you set it to .004. However, when I comment out that line, it renders immediately. I have no issues with saving the widget, either.
--One other thing, you had chra_chrb_genomes as the object name assigned to the parameter genome in the function BioCircos. I assumed it was the object genome from your question since it was the only unused object.
The only things I changed were in the BioCircos function:
(plot_chra_chrb <- BioCircos(tracklist, genome = genomes, #chra_chrb_genomes,
genomeFillColor = "RdBu",
chrPad = 0.02,
displayGenomeBorder = FALSE,
genomeLabelTextSize = "10pt",
# genomeTicksScale = 4e+3, # problematic
elementId = "chra_chrb_comp_plot_test" # no periods
))

Problem while implementating of “Continual Learning Through Synaptic Intelligence” paper

I am trying to reproduce the results of “Continual Learning Through Synaptic Intelligence” paper [1]. I tried implementing the algorithm as best as I could understand after going through paper many times. I also looked at it’s official implementation on github which is in tensorflow 1.0, but could not understand much as I don’t have much familiarity with that.
Though I got some results but not good enough as paper. I wanted to ask if anyone can help me to find out where I am going wrong. Before going into coding details I want to discuss sudo code so that I undersatnd what is going wrong with my implementation.
Here is kind of sudo code that I have implemented. Please help me.
lambda = 1
xi = 1e-3
total_tasks = 5
model = NN(total_tasks)
## multiheaded linear model ([784(input)-->256-->256-->2(output)(*5, 5 separate heads)])
## output layer is 2 neuron head (separate heads for each task, total 5 tasks)
## output is vector of size 2 (for 2 classes)
prev_theta = model.theta(copy=True) # updated at end of task
## model.theta() returns list of shared parameters (i.e. layer1 and layer2 excluding output layer)
## copy=True, gives copy of parameters
## so it don't effect original params connected to computaitonal graph
omega_total = zero_like(prev_theta) ## Capital Omega in paper (per-parameter regularization strength)
omega = zero_like(prev_theta) ## small omega in paper (per-parameter contribution to loss)
for task_num in range(total_tasks):
optmizer = ADAM() # created before every task (or reset it)
prev_theta_step = model.theta(copy=True) # updated at end of step
## trainig for task start
for epoch in range(10):
for steps in range(steps_per_epoch):
X, Y = train_dataset[task_num].sample()
## X is flattened image of size 784
## Y is binary vector of size 2 ([0,1] or [1,0])
Y_pred = model(X, task_num) # model is multihead, task_num selects the head
loss = CROSS_ENTROPY(Y_pred, Y)
if(task_num>0): ## reg_loss starts from second task
theta = model.theta()
## here copy is not true so it returns params connected to computaitonal graph
reg_loss = torch.sum(omega_total*torch.square(theta - prev_theta))
loss = loss + lambda*reg_loss
optmizer.zero_grad()
loss.backward()
theta = model.theta(copy=True)
grads = model.theta_grads() ## grads of shared paramters only
omega = omega - grads*(theta - prev_theta_step)
prev_theta_step = theta
optimizer.step()
## training for task complete, update importance parameters
theta = model.theta(copy=True)
omega_total += relu( omega/( (theta - prev_theta)**2 + xi) )
prev_theta = theta
omega = torch.zeros(theta_shape)
## evaluation code
...
...
...
## evaluation done
I am also attaching result I got. In results ‘one’ (blue) represents without regression loss (lambda=0), ‘two’ (green) represents with regression loss (lambda=1).
Thank you for reading so far. Kindly help me out.

When using the frame skipping wrapper for OpenAI Gym, what is the purpose of the np.max line?

I'm implementing the following wrapper used commonly in OpenAI's Gym for Frame Skipping. It can be found in dqn/atari_wrappers.py
I'm very confused about the following line:
max_frame = np.max(np.stack(self._obs_buffer), axis=0)
I have added comments throughout the code for the parts I understand and to aid anyone who may be able to help.
np.stack(self._obs_buffer) stacks the two states in _obs_buffer.
np.max returns the maximum along axis 0.
But what I don't understand is why we're doing this or what it's really doing.
class MaxAndSkipEnv(gym.Wrapper):
"""Return only every 4th frame"""
def __init__(self, env=None, skip=4):
super(MaxAndSkipEnv, self).__init__(env)
# Initialise a double ended queue that can store a maximum of two states
self._obs_buffer = deque(maxlen=2)
# _skip = 4
self._skip = skip
def _step(self, action):
total_reward = 0.0
done = None
for _ in range(self._skip):
# Take a step
obs, reward, done, info = self.env.step(action)
# Append the new state to the double ended queue buffer
self._obs_buffer.append(obs)
# Update the total reward by summing the (reward obtained from the step taken) + (the current
# total reward)
total_reward += reward
# If the game ends, break the for loop
if done:
break
max_frame = np.max(np.stack(self._obs_buffer), axis=0)
return max_frame, total_reward, done, info
At the end of the for loop the self._obs_buffer holds the last two frames.
Those two frames are then max-pooled over, resulting in an observation, that contains some temporal information.

Custom environment Gym for step function processing with DDPG Agent

I'm new to reinforcement learning, and I would like to process audio signal using this technique. I built a basic step function that I wish to flatten to get my hands on Gym OpenAI and reinforcement learning in general.
To do so, I am using the GoalEnv provided by OpenAI since I know what the target is, the flat signal.
That is the image with input and desired signal :
The step function calls _set_action which performs achieved_signal = convolution(input_signal,low_pass_filter) - offset, low_pass_filter takes a cutoff frequency as input as well.
Cutoff frequency and offset are the parameters that act on the observation to get the output signal.
The designed reward function returns the frame to frame L2-norm between the input signal and the desired signal, to the negative, to penalize a large norm.
Following is the environment I created:
def butter_lowpass(cutoff, nyq_freq, order=4):
normal_cutoff = float(cutoff) / nyq_freq
b, a = signal.butter(order, normal_cutoff, btype='lowpass')
return b, a
def butter_lowpass_filter(data, cutoff_freq, nyq_freq, order=4):
b, a = butter_lowpass(cutoff_freq, nyq_freq, order=order)
y = signal.filtfilt(b, a, data)
return y
class `StepSignal(gym.GoalEnv)`:
def __init__(self, input_signal, sample_rate, desired_signal):
super(StepSignal, self).__init__()
self.initial_signal = input_signal
self.signal = self.initial_signal.copy()
self.sample_rate = sample_rate
self.desired_signal = desired_signal
self.distance_threshold = 10e-1
max_offset = abs(max( max(self.desired_signal) , max(self.signal))
- min( min(self.desired_signal) , min(self.signal)) )
self.action_space = spaces.Box(low=np.array([10e-4,-max_offset]),\
high=np.array([self.sample_rate/2-0.1,max_offset]), dtype=np.float16)
obs = self._get_obs()
self.observation_space = spaces.Dict(dict(
desired_goal=spaces.Box(-np.inf, np.inf, shape=obs['achieved_goal'].shape, dtype='float32'),
achieved_goal=spaces.Box(-np.inf, np.inf, shape=obs['achieved_goal'].shape, dtype='float32'),
observation=spaces.Box(-np.inf, np.inf, shape=obs['observation'].shape, dtype='float32'),
))
def step(self, action):
range = self.action_space.high - self.action_space.low
action = range / 2 * (action + 1)
self._set_action(action)
obs = self._get_obs()
done = False
info = {
'is_success': self._is_success(obs['achieved_goal'], self.desired_signal),
}
reward = -self.compute_reward(obs['achieved_goal'],self.desired_signal)
return obs, reward, done, info
def reset(self):
self.signal = self.initial_signal.copy()
return self._get_obs()
def _set_action(self, actions):
actions = np.clip(actions,a_max=self.action_space.high,a_min=self.action_space.low)
cutoff = actions[0]
offset = actions[1]
print(cutoff, offset)
self.signal = butter_lowpass_filter(self.signal, cutoff, self.sample_rate/2) - offset
def _get_obs(self):
obs = self.signal
achieved_goal = self.signal
return {
'observation': obs.copy(),
'achieved_goal': achieved_goal.copy(),
'desired_goal': self.desired_signal.copy(),
}
def compute_reward(self, goal_achieved, goal_desired):
d = np.linalg.norm(goal_desired-goal_achieved)
return d
def _is_success(self, achieved_goal, desired_goal):
d = self.compute_reward(achieved_goal, desired_goal)
return (d < self.distance_threshold).astype(np.float32)
The environment can then be instantiated into a variable, and flattened through the FlattenDictWrapper as advised here https://openai.com/blog/ingredients-for-robotics-research/ (end of the page).
length = 20
sample_rate = 30 # 30 Hz
in_signal_length = 20*sample_rate # 20sec signal
x = np.linspace(0, length, in_signal_length)
# Desired output
y = 3*np.ones(in_signal_length)
# Step signal
in_signal = 0.5*(np.sign(x-5)+9)
env = gym.make('stepsignal-v0', input_signal=in_signal, sample_rate=sample_rate, desired_signal=y)
env = gym.wrappers.FlattenDictWrapper(env, dict_keys=['observation','desired_goal'])
env.reset()
The agent is a DDPG Agent from keras-rl, since the actions can take any values in the continuous action_space described in the environment.
I wonder why the actor and critic nets need an input with an additional dimension, in input_shape=(1,) + env.observation_space.shape
nb_actions = env.action_space.shape[0]
# Building Actor agent (Policy-net)
actor = Sequential()
actor.add(Flatten(input_shape=(1,) + env.observation_space.shape, name='flatten'))
actor.add(Dense(128))
actor.add(Activation('relu'))
actor.add(Dense(64))
actor.add(Activation('relu'))
actor.add(Dense(nb_actions))
actor.add(Activation('linear'))
actor.summary()
# Building Critic net (Q-net)
action_input = Input(shape=(nb_actions,), name='action_input')
observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')
flattened_observation = Flatten()(observation_input)
x = Concatenate()([action_input, flattened_observation])
x = Dense(128)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(1)(x)
x = Activation('linear')(x)
critic = Model(inputs=[action_input, observation_input], outputs=x)
critic.summary()
# Building Keras agent
memory = SequentialMemory(limit=2000, window_length=1)
policy = BoltzmannQPolicy()
random_process = OrnsteinUhlenbeckProcess(size=nb_actions, theta=0.6, mu=0, sigma=0.3)
agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input,
memory=memory, nb_steps_warmup_critic=2000, nb_steps_warmup_actor=10000,
random_process=random_process, gamma=.99, target_model_update=1e-3)
agent.compile(Adam(lr=1e-3, clipnorm=1.), metrics=['mae'])
Finally, the agent is trained:
filename = 'mem20k_heaviside_flattening'
hist = agent.fit(env, nb_steps=10, visualize=False, verbose=2, nb_max_episode_steps=5)
with open('./history_dqn_test_'+ filename + '.pickle', 'wb') as handle:
pickle.dump(hist.history, handle, protocol=pickle.HIGHEST_PROTOCOL)
agent.save_weights('h5f_files/dqn_{}_weights.h5f'.format(filename), overwrite=True)
Now here is the catch: the agent seems to always be stuck to the same neighborhood of output values across all episodes for a same instance of my env:
The cumulated reward is negative since I just allowed the agent to get negative rewards. I used it from https://github.com/openai/gym/blob/master/gym/envs/robotics/fetch_env.py which is part of OpenAI code as example.
Across one episode, I should get varying sets of actions converging towards a (cutoff_final, offset_final) that would get my input step signal close to my output flat signal, which is clearly not the case. In addition, I thought, for successive episodes, I should get different actions.
I wonder why the actor and critic nets need an input with an additional dimension, in input_shape=(1,) + env.observation_space.shape
I think the GoalEnv is designed with HER (Hindsight Experience Replay) in mind, since it will use the "sub-spaces" inside the observation_space to learn from sparse reward signals (there is a paper in OpenAI website that explains how HER works). Haven't look at the implementation, but my guess is that there needs to be an additional input since HER also process the "goal" parameter.
Since it seems you are not using HER (works with any off-policy algorithm, including DQN, DDPG, etc), you should handcraft an informative reward function (rewards are not binary, eg, 1 if objective achieved, 0 otherwise) and use the base Env class. The reward should be calculated inside the step method, since rewards in MDP's are functions like r(s, a, s`) you probably will have all the information you need. Hope it helps.

Dice and CE loss not training network together

I am training a segmentation network on the Kaggle Salt challenge. My dice and ce decrease, but then suddenly dice increases and CE jumps up a bit, this keeps happening to dice. I have been trying all day to fix this but can’t get my code to run. I am running on only 10 data points to overfit my data but it just is not happening. Any help would be greatly appreciated.
Plots of dice(top) and CE:
Loss curve
Heres my dice and train:
def dice(input, target,weights=torch.tensor([1,1]).float().cuda()):
smooth=.001
dummy=np.zeros([batch_size,2,100,100]) # create dummy to one hot encode target for weighted dice
dummy[:,0,:,:][target==0]=1 # background class is 0
dummy[:,1,:,:][target==1]=1 # salt class is 1
target=torch.tensor(dummy).float().cuda()
# print(input.size(),input[:,0,:,:].size())
input1=input[:,0,:,:].contiguous().view(-1) #flatten both classes seperately
target1=target[:,0,:,:].contiguous().view(-1)
input2=input[:,1,:,:].contiguous().view(-1)
target2=target[:,1,:,:].contiguous().view(-1)
score1=2*(input1*target1).sum()/(input1.sum()+target1.sum()+smooth) #back
score2=2*(input2*target2).sum()/(input2.sum()+target2.sum()+smooth) #salt
score=1-(weights[0]*score1+weights[1]*score2)/2
if score<0:
score=score-score
return(score)
Heres the train:
def train(epoch):
for idx, batch_data in enumerate(dataloader) :
x, target=batch_data['image'].float().cuda(),batch_data['label'].float().cuda()
optimizer.zero_grad()
output = net(x)
# print(output.size())
output.squeeze_(1)
# print('out',output.size(),target.size())
bce_loss = criterion(output, target.long())
lc.append(bce_loss.item())
dice_loss = dice((output), target)
ld.append(dice_loss.item())
loss = dice_loss + bce_loss
l.append(loss.item())
loss.backward()
optimizer.step()
print('Epoch {}, loss {}, bce {}, dice {}'.format(
epoch, sum(l)/len(l), sum(lc)/len(lc) , sum(ld)/len(ld) ))
Heres the rest of the code( I downed from gaggle kernel): https://github.com/bluesky314/Salt-Segmentation/blob/master/kernel-2.ipynb 1 (the training showed here is when I ran that cell(14) a second time so the ups and downs don’t appear but can be seen in the plot)
dataset=DatasetSalt(limit_paths=10) just limits the dataset to any number by only taking the top paths to get the images from
Would really appreciate any help, have been literally struggling on this for 8+ hrs