RLLib define self.state with more than one parameter - reinforcement-learning

I am currently building a reinforcement learning AI with RLLib to solve a simple moving problem. For now the state of the agent just contains the position of himself. But the thing is that the agent should be able to change the environment. So if I understand it right, the state must also contain, how the environment currently looks. Because in position xy could be action z good with the one environment but if the agent is in the same position xy and does action z but the enivornment was changed in steps before the action could be bad (negative reward). So in order to consider the already changed things in the environment, the env has to be in the state, right?
So this was my thinking, but I'm not able to add my env to the state.
My env is currently described as a multidimensional array. But if I just add this array to the state I get a problem with my observation_space because i now can't just say that the spaces goes from x to y because I don't really have borders for this. It's just an array which count of objects isn't even set.
I looked through many example codes but none of them could really give me an idea how to solve this. Has anybody an idea?

Related

Find dependencies between words in a sentence

I need to find connections between words in a sentence, like this (spacy lib).
How can i achieve these results with deep learning?
I don't really understand how hugging-face transformers work, because this library lean on a "self-attention" mechanism, which is quite a mystery for me.
Maybe i should stick to RNN, but i don't know what kind of properties (words, lemmas, morphemes) i should pass to the NN, and how to vectorize it.
I created some dataset sample, where i store each word, its POS, tense, gender, case, plurality/singularity (0 if doesn't have this property), word's parent (0 if it's sentence root)
I have got a few questions:
What would be an appropriate size of a dataset for this problem? In sentences
What kind of a model do i need to solve this and how this model learns?
I can't figure it out, so please describe everything in as much detail as possible. Thank you!

Training out false positives in object detection

This is my first foray into the world of object recognition. I have successfully trained a model on yolo with images that I have found on Google and annotated myself in CVAT.
My questions are as follows.
a) How do I train the model to ignore some special variant that I am specifically NOT interested in detecting? Say I am getting false positives because something looks similar to one of my objects, and I want to train so that these are not detected. Does it simply work to include images that contain the unwanted object into the training set, but don't annotate the unwanted object?
b) If so, am I right in assuming that if I train on annotated images that have somehow missed occasional instances of desired objects, is that effectively telling the training engine that I'm not interested in that object? In other words, is it therefore BAD if images don't have every single instance of desired objects annotated?
c) If I happen to include an image in my training set with an empty annotation file, and there are desired objects in that image, that effectively disincentivizes the training engine to find those in future?
Thanks for any thoughts.
a) This is true. The model will consider space inside bounding boxes as positive for a certain class during training, and space outside the boxes for the class negative for that class.
b) See a, this is indeed the case.
c) Empty annotation files will be used during training, but the model will train on that image as a 'background' class, so these are negatives too.
So, in short, annotate all instances of objects of a certain class and maybe add 'background images' as negative examples to disincentivize those.

Mapping Nonlinear Functions By Using Artificial Neural Network

I am dealing with an hard assignment which I could not move the pen. What is the way to solve the following problem? Any help would be appreciated.
f(x)=1/x and x is between 0.1 and 1
The problem is asking to traing the network by using back propagation algorithm with one hidden layer.
Trainin set will have 200 input/output pattern, test set will have 100 and validation will have 50 patterns.
How can I solve this? Regards.
That sound much more complicated than it actually is. The network does not know anything about what you actually want to represent with the input and output pattern. So do not worry about that. All you need to do is setup such a network (I assume that you know how to do that - otherwise just check around there are couple of libs, but it is even possible in Excel to set it up quickly for testing purposes)
Then just run the test data against the network in a loop. Once the network is kind of stable store it and start testing.
I assume the representation of the patters has been defined already? It's one of the most important point that defines the quality. The closer the x/y pairs are semantically the closer the representation patterns have to be - meaning here the delta between x/y pairs. In particular for the small x value/large y pairs!
Otherwise the network will not "understand" that and you can teach forever - since there is no correct representation of the similarity - in this case the delta x and delta y
For example the value 7 in binary format is not close at all to the value 8. Meaning if the network did not "learn" that because it has never seen the 8 it will not work well.
So the closer the values the more similarities the representation of the values should be for the network! - That's the key.
Tweaking the parameters will then fine tune your model

AS3 repeat same code in a vertor which has 2500 objects

This is my problem, i'm making a path finding program 'jump point search algorithm'. And i need to reset every node (object) in the vector 40 by 40 vector so 2500 nodes, so i need to do the following
//* some type of loop*//
{
node.is_been_on = false;
}
But my path finding may happen 5 times every seconds with a few objects. So that a lot of looping.
What is the CPU friendly way to do this, or another solution which means i don't need to do it.
One of my friends saying that i should make a 40 by 40 boolean array and having the is_been_on variable it, so i would refer to that and not the node, would that be better?
Thanks for reading, and i hope you can help
The most simple idea is to reset only the nodes that you've changed - store them in different array and iterate only it - JPS should modify only a small part of the given nodes.
The idea of your friends is not better, since you will still iterate over all nodes, and modify each value. The values of the node are also boolean (or at least I hope so), so you win nothing but having second array (vector) of values.
Either way I don't find it that bad to modify bool values, but if you really need to optimize (which I find great) - go with "reset what's changed" - can't imagine better one.
But why do you recalculate path 5 times every second? You have graph with a size 40x40, by the help of A* or another one algorithm, you will be able find correct path. As you calculate path, you don't recalculate it again, only if you have dynamic obstacles in the game.
If you don't know how to implement pathfinding algorithm in AS3 project. There are several ready solutions

How to find the Shortest Path between all the nodes in a graph without having a pre-defined start or end points?

What I want to get is: the path which connect all the points in my graph, but without having to tell the algorithm where to start and where to finish.
It need to use the driving direction in google-maps api but without setting a start or end point.
It is not the TSP problem because I don't have a "start city" and I don't have to get back to the "start city" neither.
As expressed in this question: Find the shortest path in a graph which visits certain nodes,
I could just use permutation because I have a few nodes, but the problem is that I need to analyze several groups of this few nodes So I would like the function to be the less time consuming posible.
NOTE: Im not looking for a Minimum Spaning Tree as this one neither: https://math.stackexchange.com/questions/130863/connecting-all-points-on-a-plane-with-shortest-path-possible
I want a path which tell me you will save gas if you go first here, then overthere, then overthere, and finally there.
Question: is there any library which can help me with that? Or is it a know problem that has already an exact answer? How could I solve it?
It sounds like you want an all pairs shortest path algorithm. This is the class of shortest path algorithms that attempt to compute the shortest path (or the length of the shortest path) between every pair of vertices in the graph.
These is a well-known problem, and solutions exist. Here's some reading material that describes other possible algorithms. There might be implementations of Johnson's algorithm for your chosen language and development environment.
Keep in mind, this is an expensive problem, computationally speaking.
If I understand you correctly, you want 1 route to visit all the nodes, without a predefined start/end and you want that to be minimal. A possible solution could be to modify your graph a bit to allow a travelling salesman algorithm to get a complete tour.
You start with your graph and add 1 extra node E. You connect that node to all other nodes in your graph and set the cost of all those edges to a very high constant M. You then unleash a travelling salesman algorithm on that graph which will give you a path P starting at E, passing all nodes and returning to E. If you remove the 2 edges in P that connected E to the rest of your path you will have what you were looking for.
A quick intuitive proof that it is indeed what you were looking for: Suppose it's not the cheapest way to connect all nodes. Let's call the supposedly better path Q. Q and P both connect all nodes in your original graph. The end points of Q would be A and B. Both of these would be connected to node E with an edge of cost M. If you would add those 2 edges to Q, you would get a better TSP solution than P, which is not possible as P was the best.
As you are using google map, your particular instance of TSP might satisfy the Triangle inequality.
Are you really speaking of distances or travel time ?
In the case of distances:
try Googling: "triangle traveling salesman problem"
IMPORTANT: The result is a very good approximation of the best result with guaranteed uper bound, not always the best.
One way to go would be using (self-organized) kohonen networks.
Assume you have n cities on a map (works the same in any dimension).
Take a chain of n connected "neurons" and place it randomly on the map.
Then you do several iterations, one iteration contains:
choose any city. (e.g. go through them in a ordered fashion)
determine the "closest" neuron, call it x. (e.g. euclidian distance)
Move this x closer to the city (e.g. take the direction vector from the neuron to the city and multiply it with a learning rate 0
Move neighbors of this neuron also towards this city (but less than in 3., dependend of distance from the neighbors to the "current closest" neuron x)
One can choose various functions in step 2, 3 and 4.
Notice also that this might not give the globally shortest path since it depends on where the start chain is located and different other things. For this on may consider doing several runs with different starting conditions or (depending of the problem) one can help a bit with pre-knowlege.
I hope this helps to complete this question for further readers...