RLIB Multiple Agents with different training algorthims - reinforcement-learning

In RLLIB is it possible to have multiple agents with different learning algorithms? (example: one agent using DQN and one agent using Q-Learning)?

Yes, it is possible. See this example from the documentation:
trainer = pg.PGAgent(env="my_multiagent_env", config={
"multiagent": {
"policies": {
# the first tuple value is None -> uses default policy
"car1": (None, car_obs_space, car_act_space, {"gamma": 0.85}),
"car2": (None, car_obs_space, car_act_space, {"gamma": 0.99}),
"traffic_light": (None, tl_obs_space, tl_act_space, {}),
},
"policy_mapping_fn":
lambda agent_id:
"traffic_light" # Traffic lights are always controlled by this policy
if agent_id.startswith("traffic_light_")
else random.choice(["car1", "car2"]) # Randomly choose from car policies
}, })
while True:
print(trainer.train())
in which each has a different PG algorithm.

Related

Difference between torch.Tensor([1,2,3]) and torch.tensor([1,2,3])?

I want to understand what is the significance of each function torch.Tensor([1,2,3]) and torch.tensor([1,2,3]).
The one difference I found is torch.Tensor() creates tensors with int64 dtype and torch.tensor() creates float32 dtype by default. Is there any other significant difference between both?
Are there any other differences between both apart from what I have mentioned above, Also, when and where to use which one?
It's exactely the other way around :)
torch.Tensor() returns a tensor that can hold 32-bit floating-point numbers as it is an alias for torch.FloatTensor.
torch.tensor(X) (with only integers in X) returns a 64-bit integer tensor by default as torch.tensor() infers the data type automatically.
But the initialization phase is really the only difference between the options. As torch.tensor() is a wrapper function to create a Tensor with pre-existing data. It is sometimes recommended to use torch.tensor() as it offers some possibilities to specify e.g. the data type by setting the dtype argument. On the other hand, to create a Tensor without data, you would need to use torch.Tensor(). Either way, in both cases you end up with a torch.Tensor.
print(torch.Tensor([1, 2, 3]).dtype) # torch.float32
print(torch.FloatTensor([1, 2, 3]).dtype) # torch.float32
print(torch.tensor([1, 2, 3], dtype=torch.float32).dtype) # torch.float32
print(torch.equal(torch.Tensor([1, 2, 3]), torch.FloatTensor([1, 2, 3]))) # True
print(torch.equal(torch.Tensor([1, 2, 3]), torch.tensor([1, 2, 3], dtype=torch.float32))) # True
print(torch.tensor([1, 2, 3]).dtype) # torch.int64
print(torch.LongTensor([1, 2, 3]).dtype) # torch.int64
print(torch.equal(torch.tensor([1, 2, 3]), torch.LongTensor([1, 2, 3]))) # True
print(torch.Tensor()) # tensor([])
print(torch.tensor()) # throws an error

When using the frame skipping wrapper for OpenAI Gym, what is the purpose of the np.max line?

I'm implementing the following wrapper used commonly in OpenAI's Gym for Frame Skipping. It can be found in dqn/atari_wrappers.py
I'm very confused about the following line:
max_frame = np.max(np.stack(self._obs_buffer), axis=0)
I have added comments throughout the code for the parts I understand and to aid anyone who may be able to help.
np.stack(self._obs_buffer) stacks the two states in _obs_buffer.
np.max returns the maximum along axis 0.
But what I don't understand is why we're doing this or what it's really doing.
class MaxAndSkipEnv(gym.Wrapper):
"""Return only every 4th frame"""
def __init__(self, env=None, skip=4):
super(MaxAndSkipEnv, self).__init__(env)
# Initialise a double ended queue that can store a maximum of two states
self._obs_buffer = deque(maxlen=2)
# _skip = 4
self._skip = skip
def _step(self, action):
total_reward = 0.0
done = None
for _ in range(self._skip):
# Take a step
obs, reward, done, info = self.env.step(action)
# Append the new state to the double ended queue buffer
self._obs_buffer.append(obs)
# Update the total reward by summing the (reward obtained from the step taken) + (the current
# total reward)
total_reward += reward
# If the game ends, break the for loop
if done:
break
max_frame = np.max(np.stack(self._obs_buffer), axis=0)
return max_frame, total_reward, done, info
At the end of the for loop the self._obs_buffer holds the last two frames.
Those two frames are then max-pooled over, resulting in an observation, that contains some temporal information.

Why do we "pack" the sequences in PyTorch?

I was trying to replicate How to use packing for variable-length sequence inputs for rnn but I guess I first need to understand why we need to "pack" the sequence.
I understand why we "pad" them but why is "packing" (via pack_padded_sequence) necessary?
I have stumbled upon this problem too and below is what I figured out.
When training RNN (LSTM or GRU or vanilla-RNN), it is difficult to batch the variable length sequences. For example: if the length of sequences in a size 8 batch is [4,6,8,5,4,3,7,8], you will pad all the sequences and that will result in 8 sequences of length 8. You would end up doing 64 computations (8x8), but you needed to do only 45 computations. Moreover, if you wanted to do something fancy like using a bidirectional-RNN, it would be harder to do batch computations just by padding and you might end up doing more computations than required.
Instead, PyTorch allows us to pack the sequence, internally packed sequence is a tuple of two lists. One contains the elements of sequences. Elements are interleaved by time steps (see example below) and other contains the size of each sequence the batch size at each step. This is helpful in recovering the actual sequences as well as telling RNN what is the batch size at each time step. This has been pointed by #Aerin. This can be passed to RNN and it will internally optimize the computations.
I might have been unclear at some points, so let me know and I can add more explanations.
Here's a code example:
a = [torch.tensor([1,2,3]), torch.tensor([3,4])]
b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True)
>>>>
tensor([[ 1, 2, 3],
[ 3, 4, 0]])
torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2])
>>>>PackedSequence(data=tensor([ 1, 3, 2, 4, 3]), batch_sizes=tensor([ 2, 2, 1]))
Here are some visual explanations1 that might help to develop better intuition for the functionality of pack_padded_sequence().
TL;DR: It is performed primarily to save compute. Consequently, the time required for training neural network models is also (drastically) reduced, especially when carried out on very large (a.k.a. web-scale) datasets.
Let's assume we have 6 sequences (of variable lengths) in total. You can also consider this number 6 as the batch_size hyperparameter. (The batch_size will vary depending on the length of the sequence (cf. Fig.2 below))
Now, we want to pass these sequences to some recurrent neural network architecture(s). To do so, we have to pad all of the sequences (typically with 0s) in our batch to the maximum sequence length in our batch (max(sequence_lengths)), which in the below figure is 9.
So, the data preparation work should be complete by now, right? Not really.. Because there is still one pressing problem, mainly in terms of how much compute do we have to do when compared to the actually required computations.
For the sake of understanding, let's also assume that we will matrix multiply the above padded_batch_of_sequences of shape (6, 9) with a weight matrix W of shape (9, 3).
Thus, we will have to perform 6x9 = 54 multiplication and 6x8 = 48 addition                    
(nrows x (n-1)_cols) operations, only to throw away most of the computed results since they would be 0s (where we have pads). The actual required compute in this case is as follows:
9-mult 8-add
8-mult 7-add
6-mult 5-add
4-mult 3-add
3-mult 2-add
2-mult 1-add
---------------
32-mult 26-add
------------------------------
#savings: 22-mult & 22-add ops
(32-54) (26-48)
That's a LOT more savings even for this very simple (toy) example. You can now imagine how much compute (eventually: cost, energy, time, carbon emission etc.) can be saved using pack_padded_sequence() for large tensors with millions of entries, and million+ systems all over the world doing that, again and again.
The functionality of pack_padded_sequence() can be understood from the figure below, with the help of the used color-coding:
As a result of using pack_padded_sequence(), we will get a tuple of tensors containing (i) the flattened (along axis-1, in the above figure) sequences , (ii) the corresponding batch sizes, tensor([6,6,5,4,3,3,2,2,1]) for the above example.
The data tensor (i.e. the flattened sequences) could then be passed to objective functions such as CrossEntropy for loss calculations.
1 image credits to #sgrvinod
The above answers addressed the question why very well. I just want to add an example for better understanding the use of pack_padded_sequence.
Let's take an example
Note: pack_padded_sequence requires sorted sequences in the batch (in the descending order of sequence lengths). In the below example, the sequence batch were already sorted for less cluttering. Visit this gist link for the full implementation.
First, we create a batch of 2 sequences of different sequence lengths as below. We have 7 elements in the batch totally.
Each sequence has embedding size of 2.
The first sequence has the length: 5
The second sequence has the length: 2
import torch
seq_batch = [torch.tensor([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]]),
torch.tensor([[10, 10],
[20, 20]])]
seq_lens = [5, 2]
We pad seq_batch to get the batch of sequences with equal length of 5 (The max length in the batch). Now, the new batch has 10 elements totally.
# pad the seq_batch
padded_seq_batch = torch.nn.utils.rnn.pad_sequence(seq_batch, batch_first=True)
"""
>>>padded_seq_batch
tensor([[[ 1, 1],
[ 2, 2],
[ 3, 3],
[ 4, 4],
[ 5, 5]],
[[10, 10],
[20, 20],
[ 0, 0],
[ 0, 0],
[ 0, 0]]])
"""
Then, we pack the padded_seq_batch. It returns a tuple of two tensors:
The first is the data including all the elements in the sequence batch.
The second is the batch_sizes which will tell how the elements related to each other by the steps.
# pack the padded_seq_batch
packed_seq_batch = torch.nn.utils.rnn.pack_padded_sequence(padded_seq_batch, lengths=seq_lens, batch_first=True)
"""
>>> packed_seq_batch
PackedSequence(
data=tensor([[ 1, 1],
[10, 10],
[ 2, 2],
[20, 20],
[ 3, 3],
[ 4, 4],
[ 5, 5]]),
batch_sizes=tensor([2, 2, 1, 1, 1]))
"""
Now, we pass the tuple packed_seq_batch to the recurrent modules in Pytorch, such as RNN, LSTM. This only requires 5 + 2=7 computations in the recurrrent module.
lstm = nn.LSTM(input_size=2, hidden_size=3, batch_first=True)
output, (hn, cn) = lstm(packed_seq_batch.float()) # pass float tensor instead long tensor.
"""
>>> output # PackedSequence
PackedSequence(data=tensor(
[[-3.6256e-02, 1.5403e-01, 1.6556e-02],
[-6.3486e-05, 4.0227e-03, 1.2513e-01],
[-5.3134e-02, 1.6058e-01, 2.0192e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01],
[-5.9372e-02, 1.0934e-01, 4.1991e-01],
[-6.0768e-02, 7.0689e-02, 5.9374e-01],
[-6.0125e-02, 4.6476e-02, 7.1243e-01]], grad_fn=<CatBackward>), batch_sizes=tensor([2, 2, 1, 1, 1]))
>>>hn
tensor([[[-6.0125e-02, 4.6476e-02, 7.1243e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01]]], grad_fn=<StackBackward>),
>>>cn
tensor([[[-1.8826e-01, 5.8109e-02, 1.2209e+00],
[-2.2475e-04, 2.3041e-05, 1.4254e-01]]], grad_fn=<StackBackward>)))
"""
We need to convert output back to the padded batch of output:
padded_output, output_lens = torch.nn.utils.rnn.pad_packed_sequence(output, batch_first=True, total_length=5)
"""
>>> padded_output
tensor([[[-3.6256e-02, 1.5403e-01, 1.6556e-02],
[-5.3134e-02, 1.6058e-01, 2.0192e-01],
[-5.9372e-02, 1.0934e-01, 4.1991e-01],
[-6.0768e-02, 7.0689e-02, 5.9374e-01],
[-6.0125e-02, 4.6476e-02, 7.1243e-01]],
[[-6.3486e-05, 4.0227e-03, 1.2513e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00]]],
grad_fn=<TransposeBackward0>)
>>> output_lens
tensor([5, 2])
"""
Compare this effort with the standard way
In the standard way, we only need to pass the padded_seq_batch to lstm module. However, it requires 10 computations. It involves several computes more on padding elements which would be computationally inefficient.
Note that it does not lead to inaccurate representations, but need much more logic to extract correct representations.
For LSTM (or any recurrent modules) with only forward direction, if we would like to extract the hidden vector of the last step as a representation for a sequence, we would have to pick up hidden vectors from T(th) step, where T is the length of the input. Picking up the last representation will be incorrect. Note that T will be different for different inputs in batch.
For Bi-directional LSTM (or any recurrent modules), it is even more cumbersome, as one would have to maintain two RNN modules, one that works with padding at the beginning of the input and one with padding at end of the input, and finally extracting and concatenating the hidden vectors as explained above.
Let's see the difference:
# The standard approach: using padding batch for recurrent modules
output, (hn, cn) = lstm(padded_seq_batch.float())
"""
>>> output
tensor([[[-3.6256e-02, 1.5403e-01, 1.6556e-02],
[-5.3134e-02, 1.6058e-01, 2.0192e-01],
[-5.9372e-02, 1.0934e-01, 4.1991e-01],
[-6.0768e-02, 7.0689e-02, 5.9374e-01],
[-6.0125e-02, 4.6476e-02, 7.1243e-01]],
[[-6.3486e-05, 4.0227e-03, 1.2513e-01],
[-4.3123e-05, 2.3017e-05, 1.4112e-01],
[-4.1217e-02, 1.0726e-01, -1.2697e-01],
[-7.7770e-02, 1.5477e-01, -2.2911e-01],
[-9.9957e-02, 1.7440e-01, -2.7972e-01]]],
grad_fn= < TransposeBackward0 >)
>>> hn
tensor([[[-0.0601, 0.0465, 0.7124],
[-0.1000, 0.1744, -0.2797]]], grad_fn= < StackBackward >),
>>> cn
tensor([[[-0.1883, 0.0581, 1.2209],
[-0.2531, 0.3600, -0.4141]]], grad_fn= < StackBackward >))
"""
The above results show that hn, cn are different in two ways while output from two ways lead to different values for padding elements.
Adding to Umang's answer, I found this important to note.
The first item in the returned tuple of pack_padded_sequence is a data (tensor) -- a tensor containing the packed sequence. The second item is a tensor of integers holding information about the batch size at each sequence step.
What's important here though is the second item (Batch sizes) represents the number of elements at each sequence step in the batch, not the varying sequence lengths passed to pack_padded_sequence.
For instance, given the data abc and x
the :class:PackedSequence would contain the data axbc with
batch_sizes=[2,1,1].
I used pack padded sequence as follows.
packed_embedded = nn.utils.rnn.pack_padded_sequence(seq, text_lengths)
packed_output, hidden = self.rnn(packed_embedded)
where text_lengths are the length of the individual sequence before padding and sequence are sorted according to decreasing order of length within a given batch.
you can check out an example here.
And we do packing so that the RNN doesn't see the unwanted padded index while processing the sequence which would affect the overall performance.

How does one specify the input when using a CSV with Kur

I'm trying to feed a CSV file to Kur, but I don't know how to specify more than one column in the input without the program crashing. Here's a small example:
model:
- input:
- SepalWidthCm
- SepalLengthCm
- dense: 10
- activation: tanh
- dense: 3
- activation: tanh
name: Species
train:
data:
- csv:
path: Iris.csv
header: yes
epochs: 1000
weights: best.w
log: tutorial-log
loss:
- target: Species
name: mean_squared_error
The error:
File "/Users/bytter/.pyenv/versions/3.5.2/bin/kur", line 11, in <module>
sys.exit(main())
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/__main__.py", line 269, in main
sys.exit(args.func(args) or 0)
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/__main__.py", line 48, in train
func = spec.get_training_function()
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/kurfile.py", line 282, in get_training_function
model = self.get_model(provider)
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/kurfile.py", line 148, in get_model
self.model.build()
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/model/model.py", line 282, in build
self.build_graph(input_nodes, output_nodes, network)
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/model/model.py", line 356, in build_graph
for layer in node.container.build(self):
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/containers/container.py", line 281, in build
self._built = list(self._build(model))
File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/containers/layers/placeholder.py", line 122, in _build
'Placeholder "{}" requires a shape.'.format(self.name))
kur.containers.parsing_error.ParsingError: Placeholder "..input.0" requires a shape.
Using - input: SepalWidthCm works as expected.
The problem with your approach is that Kur doesn't know how you want the inputs concatenated. Should your input become 2D tensor of dimensions (2, N) (where N is the number of data points in your CSV file), like this?
[
[SepalWidthCm_0, SepalWidthCm_1, ...],
[SepalLengthCm_0, SepalLengthCm_1, ...]
]
(N.B., that example isn't a very deep-learning friendly structure.) Or should it be combined into a tensor of dimensions (N, 2), like this?
[
[SepalWidthCm_0, SepalLengthCm_0],
[SepalWidthCm_1, SepalLengthCm_1],
...
]
Or maybe you want to apply the same operations to each column in parallel? Regardless, this problem gets a lot harder / more ambiguous to answer when your input data is multi-dimensional (e.g., instead of scalars like length or width, you have vectors or even matrices).
Instead of trying to guess what you want (and possibly getting it wrong), Kur expects each input to be a single data source, which you can then combine however you see fit.
Here are a couple ways you might want your data combined, and how to do it in Kur.
Row-wise Combination. This is the second example above, where we want to combine "rows" of CSV data into tuples, so that the input tensor has dimensionality (batchSize, 2). Then your Kur model would look like:
model:
# Define the model inputs.
- input: SepalWidthCm
- input: SepalLengthCm
# Concatenate the inputs.
- merge: concat
inputs: [SepalWidthCm, SepalLengthCm]
# Do processing on these "vectorized" inputs.
- dense: 10
- activation: tanh
- dense: 1
- activation: tanh
# Output
- output: Species
Independent Processing, and then Combining. This is the setup where you do some operations on each input column independently, and then you merge them together (potentially with some more operations afterwards). In ASCII-art, this might look like:
INPUT_1 --> dense, activation --\
+---> dense, activation --> OUTPUT
INPUT_2 --> dense, activation --/
In this case, you would have a Kur model that looks like this:
model:
# First "branch" of processing.
- input: SepalWidthCm
- dense: 10
- activation: tanh
name: WidthBranch
# Second "branch" of processing.
- input: SepalLengthCm
- dense: 10
- activation: tanh
name: LengthBranch
# Fuse things together.
- merge:
inputs: [WidthBranch, LengthBranch]
# Continue some processing
- dense: 1
- activation: tanh
# Output
- output: Species
Keep in mind that the merge layer has been around since Kur 0.3, so make sure you using a recent version.
(Disclaimer: I am the core maintainer of Kur.)

allow All trafic to and from the instance using boto

The following code works as expected.
import boto.ec2
conn = boto.ec2.connect_to_region("us-east-1", aws_access_key_id='xxx', aws_secret_access_key='zzz')
sg = conn.create_security_group('test_delete', 'description')
auth = conn.authorize_security_group(sg.name, None, None, ip_protocol='tcp', from_port='22', to_port='22', cidr_ip='0.0.0.0/0')
I can select "All traffic" option from user interface. There is no equivalent here in boto.
I am aware of the security risks involved, but for some reason I want to open all ports (to / from) for all traffic using boto.
use 'IpProtocol': '-1' for "All traffic" option, see below code for details.
def create_ingress_rules (credentials=None,securitygroupid=None, region_name=None):
print("3-Start creating ingress rule(s)...")
create_ingress_rules_handler = \
boto3.client('ec2',
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken'],
region_name=region_name)
try:
data = create_ingress_rules_handler.authorize_security_group_ingress(
GroupId=securitygroupid,
IpPermissions=[
{'IpProtocol': '-1',
'FromPort': 0,
'ToPort': 65535,
'IpRanges': [{'CidrIp': '0.0.0.0/0','Description': 'Temporary inbound rule for Guardrail Testing'}]}
])
print('Complete creating Ingress rule...')
except ClientError as e:
print(e)
I think you just have to specify the min and max values for a port number. Since it is a 16-bit value, the value can range from 0 to 65535. So:
auth = conn.authorize_security_group(sg.name, None, None, ip_protocol='tcp', from_port=0, to_port=65535, cidr_ip='0.0.0.0/0')
Should allow traffic on all ports for the TCP protocol.