Dealing with big and varying action space - reinforcement-learning

I am trying to implement a simple q-learning algorithm. For each state, I have a function that defines the action space which is discrete. I have realized that the action space varies for each state, some action spaces are as big as 2000 elements of possible actions while others are as small as 10 possible actions. Is this wide variation a bottleneck to training? Is 2000 possible actions in a state too big? Or I just need to ensure the number of iterations are as many as possible to capture the wide action space?

Related

YOLOV1 theory of 2 bounding box predictors in each grid cell and its possible power as psuedo anchor boxes

After YOLO1 there was a trend of using anchor boxes for a while in other iterations as priors (I believe the reason was to both speed up the training and detect different sized objects better)
However YOLOV1 has an interesting mechanism where there are k number of bounding box predictors sliding each grid cell in order to be able to specialize in detecting different scaled objects.
Here is what I wonder, ladies and gentlemen:
Given a very long training time, can these bounding box predictors in YOLOV1 achieve better bounding boxes compared to YOLOV9000 or its counterparts that rely on anchor box mechanism
In my experience, yes they can. I observed two possible optimization paths, one of which is already implemented in latest version of YOLOV3 and V5 by Ultralytics (https://github.com/ultralytics/yolov5)
What I observed was that for a YOLOv3, even before training, using a K means clustering we can ascertain a number of ``common box'' shapes. These data when fed into the network as anchor maskes really improved the performance of the YOLOv3 network for "that particular" dataset since the non-max suppression routine had much better chance of succeeding at filtering out spurious detection for particular classes in each of the detection head. To the best of my knowledge, this technique was implemented in latest iterations of their bounding box regression code.
Suppressing certain layers. In YOLOv3, the network performed detection in three stages with the idea of progressively detecting larger objects to smaller objects. YOLOv3 (and in theory V1) can benefit if with some trial and error, you can ascertain which detection head is your network preferring to use based on the common bounding box shapes that you found in step 1.

How to handle changing input element numbers and multiple action in Reinforcement Learning?

Hi Respected group members. I have query related to RL. Please help me in pointing me to the right direction. I am fairly new to RL and hence my question may sound silly so please bear with me.
Suppose e.g. task is to arrange n elements on a canvas. Action that can be applied on each element is two dimensional [move up/down, move left/right]. Agent has time limit to finish the task and once time is up it will be given reward if arrangement is right. Next task again will be same but number of elements and canvas dimensions can change. How to handle this scenario using RL as number of actions will change as number of elements will change from one task to another
One method you could consider depending on the detail of your game. If each element has the same goal and same actions you could train an agent that solves for a single element getting to the goal. Once trained you can add more elements and pass each element through the network to get an action for each element.
We have implemented something very similar. The beauty of it is that you only have to train with one element making it much quicker. Also once trained you can have any number of elements and the agent will be able to solve it as easily as if there was one element. All depends on the detail of your game and what you want to achieve.

Determining rare values from a noise generator? (Aka how does minecraft place ores) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm currently working on a 2d world generator (Side scroller) using Perlin Noise, and I want to be able to have rare ores spread through the terrain - but I'm not entirely sure how to do it.
What I was thinking about doing, was while looping through the 2d array and generating the world, collect a list of all the numbers (Or number ranges) and adding them to an array - then determining the rarity of each number / range of numbers, and placing ores like that however...
That just seems slow and there has to be a faster, better way. Is there?
Edit: Another way to phrase my question is, how exactly does Minecraft place its ores, and what would be the best way to do this in a 2d (From the side) game, using Perlin noise.
You may place the lumps according to a random probability. However, with rare numbers this will mean that large areas will have no lumps, while some other places will be relatively rich in rare ores. A better algorithm might be: establish a base area, determine an acceptable density (e.g.: from one to three rare lumps per area), and then compute random locations for the rare lumps per area.
An example: you determine that it is acceptable to have 1-3 rare lumps in every 1000x1000 area. Then for every one of these areas you compute the number of rare lumps as 1+floor(random*3) (where 0<random<1), and compute random coordinates for the resulting rare lumps. You can also add constraints as minimum distance between rare lumps, for instance, which are not possible with random noise generators: if one rare lump would be placed too close to another rare lump, just recompute its location.
The exact formula that Minecraft uses to generate terrain is a secret, as stated in this blog post by Notch.
I think that the best way to do this in a 2d Minecraft-ish game would be to generate the world block by block, with perlin noise (take a look at this tutorial). Then when each block is created, send the X and Y to another function that will work out the likelihood of there being an ore in the block and if so, what ore it will be. If the function returns an ore you can then recreate the block as that ore.
Hope this helped. If you want more specific, coded examples just let me know!
I believe Minecraft's terrain is based on several octaves of Perlin noise generation, which is then smoothed by stretching. I assume that certain features are added in further passes, such as caves, overhangs and so on. It's very likely that you'll have to experiment with Perlin or another noise function to come up with a pleasing output for your landscape. The good thing about such noise functions is that with a given seed, they will give a predictable, repeatable output for a set of inputs, i.e. you can generate your landscape in chunks, like Minecraft's, without any discontinuities. There's more here: https://gamedev.stackexchange.com/questions/28970/c-perlin-noise-generating-endless-terrain-chunks
In terms of ore generation, having individual tiles of ore spread about in a random weighted distribution won't give you a very pleasant output. You would most likely prefer ore to be produced in natural-looking clumps. You can do this by using Perlin again to produce localised pockets or vertically-stretched veins of minable materials. You can either use the function as is and use your landscape heightmap as a cut-off point (so you don't end up with ore in mid-air!), or you can feed in your 1D landscape generation function as another input, meaning you can produce ore whose distribution is scaled for different depths. As for the rarity of the ore, you can use differing cut-off values from your 2D noise output for placement and also experiment with the frequency and persistence input values.
I like the cave generation functions given here: http://www.gamedev.net/blog/33/entry-2227887-more-on-minecraft-type-world-gen/ The ridged multi-fractals could certainly be adapted to generate ore patterns or other underground features, but there is increased complexity if you follow this approach.
Edit:
The first link may be to a question you yourself posted on gamedev! Are you the same Jon? :)

Electrically charging edges in a force-based graph drawing algorithm?

I'm attempting to write a short mini-program in Python that plays around with force-based algorithms for graph drawing.
I'm trying to minimize the number of times lines intersect. Wikipedia suggests giving the lines an electrical charge so that they repel each other. I asked my physics teacher how I might simulate this, and she mentioned using calculus with Coulomb's Law, but I'm uncertain how to start.
Could somebody give me a hint on how I could do this? (Or alternatively, another way to tweak a force-based graph drawing algorithm to minimize the number of times the lines cross?) I'm just looking for a hint; no source code please.
In case anybody's interested, my source code and a youtube vid I made about it.
You need to explicitly include a term in your cost function that minimizes the number of edge crossings. For example, for every pair of edges that cross, you incur a fixed penalty or, if the edges are weighted, you incur a penalty that is the product of the two weights.

Maximum number of canvases (used as layers)?

I am writing an HTML5 canvas app in javascript. I am using multiple canvas elements as layers to support animation without having to re-draw the whole image every frame.
Is there a maximum number of canvas elements that I can layer on top of each other in this way -- (and see an appropriate result on all of the HTML5 platforms, of course).
Thank you.
I imagine you will probably hit a practical performance ceiling long before you hit the hard specified limit of somewhere between several thousand and 2,147,483,647 ... depending on the browser and what you're measuring (number of physical elements allowed on the DOM or the maximum allowable z-index).
This is correlated to another of my favorite answers to pretty much any question that involves the phrase "maximum number" - if you have to ask, you're probably Doing It Wrong™. Taking an approach that is aligned with the intended design is almost always just as possible, and avoids these unpleasant murky questions like "will my user's iPhone melt if I try to render 32,768 canvas elements stacked on top of each other?"
This is a question of the limits of the DOM, which are large. I expect you will hit a performance bottleneck before you hit a hard limit.
The key in your situation, I would say, is to prepare some simple benchmarks/tests that dynamically generate Canvases (of arbitrary number), fill them with content, and add them to the DOM. You should be able to construct your tests in such a way where A) if there is a hard limit you will spot it (using identifiable canvas content or exception handling), or B) if there is a performance limit you will spot it (using profiling or timers). Then perform these tests on a variety of browsers to establish your practical "limit".
There are also great resources available here https://developers.facebook.com/html5/build/games/ from the Facebook HTML5 games initiative. Therein are links to articles and open source benchmarking tools that address and test different strategies similar to yours.