How do I stop a python program once a variable reaches 0 or less? - function

I am building a text rpg for my CS class project. Everything works, except my lose condition. It seems like such a simple issue, and I must be overlooking a simple answer.
while health > 0:
opening()
if health <= 0:
lose_condition()
If the health variable reaches 0 or less, the program keeps running as usual. The if statement for health <= 0 never calls.

We do not have the full code, so here is a quick demo working with a comment showing how to break it. My assumption is that the combat functions are not correctly scoping the variables as global, or alternately, something else is continually resetting health.
import threading
import time
def attack_thread():
global health # If you comment out, it is not global, you never decrease the global
print('Damage thread start ' + str(health))
while health >= 0:
health -= 9
time.sleep(0.5)
print('Damage thread end ' + str(health))
def game_loop():
global health
health = 100
attack = threading.Thread(target=attack_thread)
attack.start()
while health > 0:
print('current health ' + str(health))
if health <= 0:
print('player died')
game_loop()

Related

DDQN for Connect 4: Sudden explosion of Loss

I am trying to solve Connect 4 with DDQN through the self-play regime that was used for AlphaZero. That means, I let a student version play against a teacher version of itself and replace the teacher with the student, once the student wins more than 60% of the games. I actually fairly quickly receive good results. After already ~5k games played, the agent is able to win more than 95% of games against a random player. Also, from the interaction with the agent, one can see that it learns to prevent "easy wins" and finds some nice strategies.
However, after about 100.000 games played, the loss steadily increases until it eventually blows up. It is not clear to me, why exactly this behaviour occurs. I have tried different learning rates (1e-5 up to 1e-3) and different replay buffer sizes (10.000 up to 1.000.000). My update rules look as follows:
# Get predicted q values for the actions that were taken
q_pred = self.Q_eval.forward(state_batch).to(self.Q_eval.device)
q_pred = q_pred[batch_index, action_indices]
# Replace -1 and 1 for new_state_batch
new_state_batch *= -1.
q_eval = self.Q_eval.forward(new_state_batch).to(self.Q_eval.device)
# Get target q values for the actions that were taken
move_validity = torch.Tensor(new_state_batch[:, :self.n_actions] == 0).to(self.Q_eval.device)
discard_values = self.discard_value * torch.ones([self.batch_size, self.n_actions]).to(self.Q_eval.device)
q_next = self.Q_target.forward(new_state_batch).to(self.Q_eval.device)
q_eval = torch.where(move_validity == 1., q_eval, discard_values)
max_actions = torch.argmax(q_eval, dim=1)
reward_batch = torch.Tensor(reward_batch).to(self.Q_eval.device)
terminal_batch = torch.Tensor(terminal_batch).to(self.Q_eval.device)
# Using minimax algorithm
q_target = reward_batch + self.gamma * (-q_next[batch_index, max_actions]) * terminal_batch
loss = self.Q_eval.loss(q_pred, q_target.detach()).to(self.Q_eval.device)
loss.backward()
Notice, that since it is a two-player game, the next state is from the opponents perspective. Therefore I reverse signs (i.e. let the agent make a move for the opponent) and calculate the target value by subtracting the max q-value of the next state. Consequently, if I choose action a that allows the opponent to win the game, this action should be of negative value.
Some other information about the hyperparameters:
I use a starting epsilon of 1 and end epsilon of 0.15 with a decay of 0.9999
I update the target network every 1000 steps
As the neural net I use a simple CNN with 6 layers and decreasing kernel sizes
Loss function is MSE
Optimizer is Adam with no scheduler
Has anyone run into similar problems and may give my advice on how to debug this? Are there any ways to make DDQN more stable (such as prioritised experience replay)?

Input within Functions - Python

I created a game that randomly picks a number, and the user has to guess that number. Normally, this would be easy, but I'm required to use functions to make it happen.
I have my code linked below. To explain:
get_num() function gives us a number (supposed to be 1 to 1000, but I have it 1 to 10 for troubleshooting)
ask_user() is an input that prompts the user to put in a number.
guess_check() is supposed to determine if your number it too high or too low.
num_guesses() is going to keep track of the number of times the user has guessed. But it's not done, you can ignore that for now.
I'm running PyCharm Community Edition 2021.3.2, for the record.
The problem: The program works fine, except it cannot tell if a number is too big or too small. When you keep guessing the same number, lets say "2", it will keep saying the number is too high and too low. Why? I have the If Statements perfect. If you look at the screenshot, the correct number is 7, and I have the proper if statement. Yet, it still recognizes 2 as higher than 7. Why?
Here is a picture for proof:
Here is the code:
def main():
answer = get_num()
guess = ask_user()
num_guesses = 0
while guess != answer:
check = guess_check(guess)
if check == 2:
print(f"Check is {check}. Your guess of {guess} is too high. Pick something lower. You're now on Guess {num_guesses}!\n")
guess = ask_user()
elif check == 1:
print(f"Check is {check}. Your guess of {guess} is too low. Pick something higher. You're now on Guess {num_guesses}! The correct answer is {answer}\n")
guess = ask_user()
print(f"Check is {check}. Congratulations! Your guess of {guess} is correct! \n\nNumber of Guesses: {num_guesses}")
# This function actually gives us the number to guess
def get_num():
# Randomly determines a number between 0 and 1000
import random
answer = random.randrange(1, 10)
return answer
def ask_user():
answer = int(input("Pick a number: "))
return answer
def guess_check(guess):
answer = get_num()
if guess > answer:
# Guess is too high, therefore value for guess_check() is 2
check = 2
elif guess < answer:
# Guess is too low, therefore value for guess_check() is 1
check = 1
else:
# Guess is right on, therefore value for guess_check() is 0
check = 0
return check
def num_guesses():
number = 1
guess = guess_check()
#If Statement for whether guess is too high (2) or too low (1). If it's 0, then the number of guesses will not increase.'
if guess == 1 or guess == 2:
number += 1
return number
main()
I've tried a few things to get around the problem:
I combined the ask_user() and guess_check() functions into 1 function. This did not make a difference.
Tried coding this exact same program without using functions. Ran just fine. The guess check part of the code ran without issues. So this tells me the functions are the reason this issue is happening.
Anyway, thanks so much for the help. You don't even know how much I appreciate this. I'm desperate.

What does local rank mean in distributed deep learning?

https://github.com/huggingface/transformers/blob/master/examples/run_glue.py
I want to adapt this script to do text classification on my data. The computer for this task is one single machine with two graphic cards. So this involves kind of "distributed" training with the term local_rank in the script above, especially when local_rank equals 0 or -1 like in line 83.
After reading some materials from distributed computation I guess that local_rank is like an ID for a machine. And 0 may mean this machine is the "main" or "head" in the computation. But what is -1?
Q: But what is -1?
Usually, this is used to disable the distributed setting. Indeed, as you can see here:
train_sampler = RandomSampler(train_dataset) if args.local_rank == -1 else DistributedSampler(train_dataset)
and here:
if args.local_rank != -1:
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.local_rank],
output_device=args.local_rank,
find_unused_parameters=True)
setting local_rank to -1 has this effect.
I want to add something more for the #Berriel's answer. Since you have two GPUs and not a distributed machine with a node structure, you do not need distributed methods like DistributedSampler. Hugginface use -1 to disable the distributed settings in training mechanisms.
Check out the following code from huggiface training_args.py script. As you can see if there is a distributed training mechanism self.local_rank get changed.
def _setup_devices(self) -> "torch.device":
logger.info("PyTorch: setting up devices")
if self.no_cuda:
device = torch.device("cpu")
self._n_gpu = 0
elif is_torch_tpu_available():
device = xm.xla_device()
self._n_gpu = 0
elif is_sagemaker_distributed_available():
import smdistributed.dataparallel.torch.distributed as dist
dist.init_process_group()
self.local_rank = dist.get_local_rank()
device = torch.device("cuda", self.local_rank)
self._n_gpu = 1
elif self.local_rank == -1:
# if n_gpu is > 1 we'll use nn.DataParallel.
# If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
# Explicitly set CUDA to the first (index 0) CUDA device, otherwise `set_device` will
# trigger an error that a device index is missing. Index 0 takes into account the
# GPUs available in the environment, so `CUDA_VISIBLE_DEVICES=1,2` with `cuda:0`
# will use the first GPU in that env, i.e. GPU#1
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Sometimes the line in the postinit has not been run before we end up here, so just checking we're not at
# the default value.
self._n_gpu = torch.cuda.device_count()

State, Reward per step in a multiagnet environment

(crossposted:https://ai.stackexchange.com/questions/15693/state-reward-per-step-in-a-multiagnet-environment)
In a single agent environment, the agent takes an action, then observes the next state and reward:
for ep in num_episodes:
action = dqn.select_action(state)
next_state, reward = env.step(action)
Implicitly, the for moving the simulation (env) forward is embedded inside the env.step() function.
Now in the multiagent scenario, agent 1 ($a_1$) has to make a decision at time $t_{1a}$, which will finish at time $t_{2a}$, and agent 2 ($a_2$) makes a decision at time $t_{1b} < t_{1a}$ which is finished at $t_{2b} > t_{2a}$.
If both of their actions would start and finish at the same time, then it could easily be implemented as:
for ep in num_episodes:
action1, action2 = dqn.select_action([state1, state2])
next_state_1, reward_1, next_state_2, reward_2 = env.step([action1, action2])
because the env can execute both in parallel, wait till they are done, and then return the next states and rewards. But in the scenario that I described previously, it is not clear how to implement this (at least to me). Here, we need to explicitly track time, a check at any timepoint to see if an agent needs to make a decision, Just to be concrete:
for ep in num_episodes:
for t in total_time:
action1 = dqn.select_action(state1)
env.step(action1) # this step might take 5t to complete.
as such, the step() function won't return the reward till 5 t later.
#In the mean time, agent 2 comes and has to make a decision. its reward and next step won't be observed till 10 t later.
To summarize, how would one implement a multiagent environment with asynchronous action/rewards per agents?

Is it necessary to end episodes when collision occurs in reinforcement learning

I have implemented q learning algorithm in which the agent tries to travel as far as possible. I am using instantaneous rewards and final episode reward as well. When agent collides, i am giving high collision reward in negative and I am not stopping the episode. Is it ok to do like this or the episode must be ended once the agent collides?
In my case I have defined a minimum reward threshold, if it drops below that I end the episode.
Case 1: End episode on invalid action
If you end the game before penalizing an invalid move there is no way for the network to understand that the move was invalid.
Case 2: End episode after N invalid action
This gives it room to take a few invalid actions before the episode ends. Its analogous to playing a game: you have N lives to beat the level or you lose the game
Case 3: Not ending the game at all after invalid actions
This may cause the agent to get lost in the environment sometimes only doing invalid actions, you need a good termination condition to stop the episode
Hope this helps