int variable increment from function - function

I am trying to get the input from my user to assign a 'faction' and based on following inputs modify the values for curAdvRep\curCrmRep.
The input shows that indise the funtion i get the desired result, but I need to be able to permanently modify the rep for the faction.
From file 2 called in file 1:
curAdvRep = 0
curCrmRep = 0
Crimson = "Crimson Brootherhood reputation: {0}".format(curCrmRep)
Advent = "Advent of Chaos reputation: {0}".format(curAdvRep)
PathSelDict = {'Advent' : Advent, 'Crimson' : Crimson, 'n' : n, 'c' : cont, 'd' : d, 'p' : p, 'l' : l}
def Faction (rep):
global curAdvRep
global curCrmRep
global Advent
global Crimson
if rep in PathSelDict:
if rep == 'Advent':
curAdvRep += 50
curCrmRep -= 5
Advent = "Advent of Chaos reputation: {0}".format(curAdvRep)
print(Advent)
#print(Factions[JoinWorld])
elif rep == 'Crimson':
curAdvRep -= 5
curCrmRep += 50
print(PathSelDict[JoinWorld])
else:
print(Dismiss)
sys.exit(0)
From file 1:
rep = input("Which side are you on? Advent or Crimson? ").title()
questfunc.Faction(rep)
print(Advent)
print(curAdvRep)
print(curCrmRep)
Output:
Pick up the box or leave it alone? (p or l): p
Pick up the box
Reputation Gain
Advent of Chaos reputation: 5
Which side are you on? Advent or Crimson? advent
Advent of Chaos reputation: 55
Advent of Chaos reputation: 0
0
0
I am sorry if either my question or my code is offensive. I have researched an answer for my question, but due to either not finding a matching answer or my inability to translate an indirect answer to my specific question, I haven't found the solution.

So I found a work around. For those that have also asked this question and had no helpful response, please take a look.
Choice1 = input(" Pick up the box or leave it alone? (p or l): ").lower()
RepGain(Choice1)
print(Advent, curAdvRep)
# Function for gain
def RepGain (Choice):
global curAdvRep
global curCrmRep
global Advent
global Crimson
if Choice in PathSelDict:
if Choice == 'p':
print('Reputation Gain\nAdvent + 100')
curAdvRep += 100
return curAdvRep
elif Choice == 'l':
print('Crimson + 100')
curCrmRep += 100
return curCrmRep
else:
print(Dismiss)
sys.exit(0)
Output:
Reputation Gain
Advent + 100
Advent of Chaos reputation: 100

Related

Python- Return a verified variable from a function to the main program

Can anyone please direct me to an example of where one can send a user input variable to a checking function or module & return the validated input assigning / updating the initialised variable?. I am trying to re-create something I did in C++ many years ago where I am trying to validate an integer! In this particular case that the number of bolts input in a building frame connection is such. Any direction would be greatly appreciated as my internet searches and trawls through my copy of Python A Crash Course have yet to shed any light! Many thanks in anticipation that someone will feel benevolent towards a Python newbie!
Regards Steve
Below is one on my numerous attempts at this, really I would just like to abandon and use While and a function call. In this one apparently I am not allowed to put > (line 4) between str and int, this desite my attempt to force N to be int - penultimate line!
def int_val(N):
#checks
# check 1. n > 0 for real entries
N > 0
isinstance(N, int)
N=N
return N
print("N not ok enter again")
#N = input("Input N the Number of bolts ")
# Initialiase N=0
#N = 0
# Enter the number of bolts
N = input("Input N the Number of bolts ")
int_val(N)
print("no of bolts is", N)
Is something like this what you have in mind? It takes advantage of the fact that using the built-in int function will convert a string to an integer if possible, but otherwise throw a ValueError.
def str_to_posint(s):
"""Return value if converted and greater than zero else None."""
try:
num = int(s)
return num if num > 0 else None
except ValueError:
return None
while True:
s = input("Enter number of bolts: ")
if num_bolts := str_to_posint(s):
break
print(f"Sorry, \"{s}\" is not a valid number of bolts.")
print(f"{num_bolts = }")
Output:
Enter number of bolts: twenty
Sorry, "twenty" is not a valid number of bolts.
Enter number of bolts: 20
num_bolts = 20
def str_to_posint(s):
"""Return value if converted and greater than zero else None."""
try:
num = int(s)
return num if num > 0 else None
except ValueError:
return None
while True:
s = input("Enter number of bolts: ")
if num_bolts := str_to_posint(s):
break
print(f"Sorry, "{s}" is not a valid number of bolts.")
print(f"{num_bolts = }")

Can't save data using push button (MATLAB)

I'm trying to create a figure where the user can select cells to turn on or off. Then, the user can click a button 'Enter' to save the board as an array. I successfully found a way to create interactivity in my plot thanks to a very useful explanation I found here. I just made some changes to suit my needs.
However, I can't find a way to save the board. The button is working (or at least isn't not working), but the data isn't saved. And I don't know how to fix that. Any help would be appreciated.
Here is my code:
function CreatePattern
hFigure = figure;
hAxes = axes;
axis equal;
axis off;
hold on;
% create a button to calculate the difference between 2 points
h = uicontrol('Position',[215 5 150 30],'String','Enter','Callback', #SaveArray);
function SaveArray(ButtonH, eventdata)
global initial
initial = Board;
close(hFigure)
end
N = 1; % for line width
M = 20; % board size
squareEdgeSize = 1;
% create the board of patch objects
hPatchObjects = zeros(M,M);
for j = M:-1:1
for k = 1:M
hPatchObjects(M - j+ 1, k) = rectangle('Position', [k*squareEdgeSize,j*squareEdgeSize,squareEdgeSize,squareEdgeSize], 'FaceColor', [0 0 0],...
'EdgeColor', 'w', 'LineWidth', N, 'HitTest', 'on', 'ButtonDownFcn', {#OnPatchPressedCallback, M - j+ 1, k});
end
end
%global Board
Board = zeros(M,M);
playerColours = [1 1 1; 0 0 0];
xlim([squareEdgeSize M*squareEdgeSize]);
ylim([squareEdgeSize M*squareEdgeSize]);
function OnPatchPressedCallback(hObject, eventdata, rowIndex, colIndex)
% change FaceColor to player colour
value = Board(rowIndex,colIndex);
if value == 1
set(hObject, 'FaceColor', playerColours(2, :));
Board(rowIndex,colIndex) = 0; % update board
else
set(hObject, 'FaceColor', playerColours(1, :));
Board(rowIndex,colIndex) = 1; % update board
end
end
end
%imwrite(~pattern,'custom_pattern.jpeg')

Q values overshoot in Double Deep Q Learning

I am trying to teach the agent to play ATARI Space Invaders video game, but my Q values overshoot.
I have clipped positive rewards to 1 (agent also receives -1 for losing a life), so the maximum expected return should be around 36 (maybe I am wrong about this). I have also implemented the Huber loss.
I have noticed that when my Q values start overshooting, the agent stops improving (the reward stops increasing).
Code can be found here
Plots can be found here
Note:
I have binarized frames, so that I can use bigger replay buffer (my replay buffer size is 300 000 which is 3 times smaller than in original paper)
EDIT:
I have binarized frames so I can use 1 bit(instead of 8 bits) to store one pixel of the image in the replay buffer, using numpy.packbits function. In that way I can use 8 times bigger replay buffer. I have checked if image is distorted after packing it with packbits, and it is NOT. So sampling from replay buffer works fine. This is the main loop of the code (maybe the problem is in there):
frame_count = 0
LIFE_CHECKPOINT = 3
for episode in range(EPISODE,EPISODES):
# reset the environment and init variables
frames, _, _ = space_invaders.resetEnv(NUM_OF_FRAMES)
state = stackFrames(frames)
done = False
episode_reward = 0
episode_reward_clipped = 0
frames_buffer = frames # contains preprocessed frames (not stacked)
while not done:
if (episode % REPORT_EPISODE_FREQ == 0):
space_invaders.render()
# select an action from behaviour policy
action, Q_value, is_greedy_action = self.EGreedyPolicy(Q, state, epsilon, len(ACTIONS))
# perform action in the environment
observation, reward, done, info = space_invaders.step(action)
episode_reward += reward # update episode reward
reward, LIFE_CHECKPOINT = self.getCustomReward(reward, info, LIFE_CHECKPOINT)
episode_reward_clipped += reward
frame = preprocessFrame(observation, RESOLUTION)
# pop first frame from the buffer, and add new at the end (s1=[f1,f2,f3,f4], s2=[f2,f3,f4,f5])
frames_buffer.append(frame)
frames_buffer.pop(0)
new_state = stackFrames(frames_buffer)
# add (s,a,r,s') tuple to the replay buffer
replay_buffer.add(packState(state), action, reward, packState(new_state), done)
state = new_state # new state becomes current state
frame_count += 1
if (replay_buffer.size() > MIN_OBSERVATIONS): # if there is enough data in replay buffer
Q_values.append(Q_value)
if (frame_count % TRAINING_FREQUENCY == 0):
batch = replay_buffer.sample(BATCH_SIZE)
loss = Q.train_network(batch, BATCH_SIZE, GAMMA, len(ACTIONS))
losses.append(loss)
num_of_weight_updates += 1
if (epsilon > EPSILON_END):
epsilon = self.decayEpsilon(epsilon, EPSILON_START, EPSILON_END, FINAL_EXPLORATION_STATE)
if (num_of_weight_updates % TARGET_NETWORK_UPDATE_FREQ == 0) and (num_of_weight_updates != 0): # update weights of target network
Q.update_target_network()
print("Target_network is updated!")
episode_rewards.append(episode_reward)
I have also checked the Q.train_network and Q.update_target_network functions and they work fine.
I was wondering if problem can be in hyper parameters:
ACTIONS = {"NOOP":0,"FIRE":1,"RIGHT":2,"LEFT":3,"RIGHTFIRE":4,"LEFTFIRE":5}
NUM_OF_FRAMES = 4 # number of frames that make 1 state
EPISODES = 10000 # number of episodes
BUFFER_SIZE = 300000 # size of the replay buffer(can not put bigger size, RAM)
MIN_OBSERVATIONS = 30000
RESOLUTION = 84 # resolution of frames
BATCH_SIZE = 32
EPSILON_START = 1 # starting value for the exploration probability
EPSILON_END = 0.1
FINAL_EXPLORATION_STATE = 300000 # final frame for which epsilon is decayed
GAMMA = 0.99 # discount factor
TARGET_NETWORK_UPDATE_FREQ = 10000
REPORT_EPISODE_FREQ = 100
TRAINING_FREQUENCY = 4
OPTIMIZER = RMSprop(lr=0.00025,rho=0.95,epsilon=0.01)

Custom environment Gym for step function processing with DDPG Agent

I'm new to reinforcement learning, and I would like to process audio signal using this technique. I built a basic step function that I wish to flatten to get my hands on Gym OpenAI and reinforcement learning in general.
To do so, I am using the GoalEnv provided by OpenAI since I know what the target is, the flat signal.
That is the image with input and desired signal :
The step function calls _set_action which performs achieved_signal = convolution(input_signal,low_pass_filter) - offset, low_pass_filter takes a cutoff frequency as input as well.
Cutoff frequency and offset are the parameters that act on the observation to get the output signal.
The designed reward function returns the frame to frame L2-norm between the input signal and the desired signal, to the negative, to penalize a large norm.
Following is the environment I created:
def butter_lowpass(cutoff, nyq_freq, order=4):
normal_cutoff = float(cutoff) / nyq_freq
b, a = signal.butter(order, normal_cutoff, btype='lowpass')
return b, a
def butter_lowpass_filter(data, cutoff_freq, nyq_freq, order=4):
b, a = butter_lowpass(cutoff_freq, nyq_freq, order=order)
y = signal.filtfilt(b, a, data)
return y
class `StepSignal(gym.GoalEnv)`:
def __init__(self, input_signal, sample_rate, desired_signal):
super(StepSignal, self).__init__()
self.initial_signal = input_signal
self.signal = self.initial_signal.copy()
self.sample_rate = sample_rate
self.desired_signal = desired_signal
self.distance_threshold = 10e-1
max_offset = abs(max( max(self.desired_signal) , max(self.signal))
- min( min(self.desired_signal) , min(self.signal)) )
self.action_space = spaces.Box(low=np.array([10e-4,-max_offset]),\
high=np.array([self.sample_rate/2-0.1,max_offset]), dtype=np.float16)
obs = self._get_obs()
self.observation_space = spaces.Dict(dict(
desired_goal=spaces.Box(-np.inf, np.inf, shape=obs['achieved_goal'].shape, dtype='float32'),
achieved_goal=spaces.Box(-np.inf, np.inf, shape=obs['achieved_goal'].shape, dtype='float32'),
observation=spaces.Box(-np.inf, np.inf, shape=obs['observation'].shape, dtype='float32'),
))
def step(self, action):
range = self.action_space.high - self.action_space.low
action = range / 2 * (action + 1)
self._set_action(action)
obs = self._get_obs()
done = False
info = {
'is_success': self._is_success(obs['achieved_goal'], self.desired_signal),
}
reward = -self.compute_reward(obs['achieved_goal'],self.desired_signal)
return obs, reward, done, info
def reset(self):
self.signal = self.initial_signal.copy()
return self._get_obs()
def _set_action(self, actions):
actions = np.clip(actions,a_max=self.action_space.high,a_min=self.action_space.low)
cutoff = actions[0]
offset = actions[1]
print(cutoff, offset)
self.signal = butter_lowpass_filter(self.signal, cutoff, self.sample_rate/2) - offset
def _get_obs(self):
obs = self.signal
achieved_goal = self.signal
return {
'observation': obs.copy(),
'achieved_goal': achieved_goal.copy(),
'desired_goal': self.desired_signal.copy(),
}
def compute_reward(self, goal_achieved, goal_desired):
d = np.linalg.norm(goal_desired-goal_achieved)
return d
def _is_success(self, achieved_goal, desired_goal):
d = self.compute_reward(achieved_goal, desired_goal)
return (d < self.distance_threshold).astype(np.float32)
The environment can then be instantiated into a variable, and flattened through the FlattenDictWrapper as advised here https://openai.com/blog/ingredients-for-robotics-research/ (end of the page).
length = 20
sample_rate = 30 # 30 Hz
in_signal_length = 20*sample_rate # 20sec signal
x = np.linspace(0, length, in_signal_length)
# Desired output
y = 3*np.ones(in_signal_length)
# Step signal
in_signal = 0.5*(np.sign(x-5)+9)
env = gym.make('stepsignal-v0', input_signal=in_signal, sample_rate=sample_rate, desired_signal=y)
env = gym.wrappers.FlattenDictWrapper(env, dict_keys=['observation','desired_goal'])
env.reset()
The agent is a DDPG Agent from keras-rl, since the actions can take any values in the continuous action_space described in the environment.
I wonder why the actor and critic nets need an input with an additional dimension, in input_shape=(1,) + env.observation_space.shape
nb_actions = env.action_space.shape[0]
# Building Actor agent (Policy-net)
actor = Sequential()
actor.add(Flatten(input_shape=(1,) + env.observation_space.shape, name='flatten'))
actor.add(Dense(128))
actor.add(Activation('relu'))
actor.add(Dense(64))
actor.add(Activation('relu'))
actor.add(Dense(nb_actions))
actor.add(Activation('linear'))
actor.summary()
# Building Critic net (Q-net)
action_input = Input(shape=(nb_actions,), name='action_input')
observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')
flattened_observation = Flatten()(observation_input)
x = Concatenate()([action_input, flattened_observation])
x = Dense(128)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(1)(x)
x = Activation('linear')(x)
critic = Model(inputs=[action_input, observation_input], outputs=x)
critic.summary()
# Building Keras agent
memory = SequentialMemory(limit=2000, window_length=1)
policy = BoltzmannQPolicy()
random_process = OrnsteinUhlenbeckProcess(size=nb_actions, theta=0.6, mu=0, sigma=0.3)
agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input,
memory=memory, nb_steps_warmup_critic=2000, nb_steps_warmup_actor=10000,
random_process=random_process, gamma=.99, target_model_update=1e-3)
agent.compile(Adam(lr=1e-3, clipnorm=1.), metrics=['mae'])
Finally, the agent is trained:
filename = 'mem20k_heaviside_flattening'
hist = agent.fit(env, nb_steps=10, visualize=False, verbose=2, nb_max_episode_steps=5)
with open('./history_dqn_test_'+ filename + '.pickle', 'wb') as handle:
pickle.dump(hist.history, handle, protocol=pickle.HIGHEST_PROTOCOL)
agent.save_weights('h5f_files/dqn_{}_weights.h5f'.format(filename), overwrite=True)
Now here is the catch: the agent seems to always be stuck to the same neighborhood of output values across all episodes for a same instance of my env:
The cumulated reward is negative since I just allowed the agent to get negative rewards. I used it from https://github.com/openai/gym/blob/master/gym/envs/robotics/fetch_env.py which is part of OpenAI code as example.
Across one episode, I should get varying sets of actions converging towards a (cutoff_final, offset_final) that would get my input step signal close to my output flat signal, which is clearly not the case. In addition, I thought, for successive episodes, I should get different actions.
I wonder why the actor and critic nets need an input with an additional dimension, in input_shape=(1,) + env.observation_space.shape
I think the GoalEnv is designed with HER (Hindsight Experience Replay) in mind, since it will use the "sub-spaces" inside the observation_space to learn from sparse reward signals (there is a paper in OpenAI website that explains how HER works). Haven't look at the implementation, but my guess is that there needs to be an additional input since HER also process the "goal" parameter.
Since it seems you are not using HER (works with any off-policy algorithm, including DQN, DDPG, etc), you should handcraft an informative reward function (rewards are not binary, eg, 1 if objective achieved, 0 otherwise) and use the base Env class. The reward should be calculated inside the step method, since rewards in MDP's are functions like r(s, a, s`) you probably will have all the information you need. Hope it helps.

When calling different functions in the same class, only the first function is ever called(Python) [duplicate]

This question already has answers here:
How to test multiple variables for equality against a single value?
(31 answers)
Closed 6 years ago.
I'm having a bit of trouble with my class functions. On my class I have 3 different functions but whenever I call one of the functions outside of the class, it only ever calls the first one despite me typing in the correct function name.
This is the class below with the different functions, although I have only included two as I don't want you to have to search through lots of code.
class mage(baseclass):
def __init__(self, name, level, attack, defence, hp):
baseclass.__init__(self, name, level, hp)
self.attack = attack
self.defence = defence
def __str__(self):
return "You are now a Mage, your new stats are:\n Level: {0}\n Attack: {1}\n Defence: {2}\n HP: {3}".format(self.level, self.attack, self.defence, self.hp)
def flamevortex(self, x, y, z):
print("You used Flame Vortex")
time.sleep(1.5)
damageofmove = 3
damagedone = damageofmove*y
damagedoneafterdefence = damagedone - z
x = x - damagedoneafterdefence
print("The monster's health is now " + str(x))
time.sleep(1.5)
return x
def lightningbolt(self, x, y, z):
print("You used Lightning Bolt")
time.sleep(1.5)
damageofmove = 3
damagedone = damageofmove*y
damagedoneafterdefence = damagedone - z
x = x - damagedoneafterdefence
print("The monster's health is now " + str(x))
time.sleep(1.5)
return x
This is the place where I am calling the functions:
if Userattack.upper() == "FLAMEVORTEX" or "FLAME VORTEX":
monster1.hp = p1.flamevortex(monster1.hp, p1.attack, monster1.defence)
if chosenmove == monsterattacks[0]:
p1.hp = monsterlasersword(p1.hp)
elif chosenmove == monsterattacks[1]:
p1.hp = monsterswipe(p1.hp)
elif chosenmove == monsterattacks[2]:
monster1.hp = monsterregen(monster1.hp)
time.sleep(1.5)
print("After the monster's attacks, your hp is now " + str(p1.hp))
elif Userattack.upper() == "LIGHTNINGBOLT" or "LIGHTNING BOLT":
monster1.hp = p1.lightningbolt(monster1.hp, p1.attack, monster1.defence)
if chosenmove == monsterattacks[0]:
p1.hp = monsterlasersword(p1.hp)
elif chosenmove == monsterattacks[1]:
p1.hp = monsterswipe(p1.hp)
elif chosenmove == monsterattacks[2]:
monster1.hp = monsterregen(monster1.hp)
time.sleep(1.5)
print("After the monster's attacks, your hp is now " + str(p1.hp))
No matter what the user inputs, it only ever calls the first function.
I know this is a lot to process and appreciate any help. Thanks
if Userattack.upper() == "FLAMEVORTEX" or "FLAME VORTEX": means is userattack.upper() equal to "FLAMEVORTEX", or does the string "FLAME VORTEX" have True value.
Now since empty strings are False and non-empty strings are True, Userattack.upper() == "FLAMEVORTEX" or "FLAME VORTEX" is always True, and that's not what you meant.
Try: Userattack.upper() == "FLAMEVORTEX" or Userattack.upper()=="FLAME VORTEX"