Can I train DQN without updating training agent? - reinforcement-learning

I'm a newbie in RL so please forgive me if I ask stupid question:)
I'm working a DQN project right now and it's very similar to the simplest snake game. The game is wrote in js and has a demo (in which snake moves randomly). But since I don't know how to write js, I can't pass the action value to the game during trainng process, what I'm doing now is generating random game image and training the dqn model instead.
What I want to ask is that: Is it possible to do in this way? Can the Q(s,r) still converge? If it's possible, is there anything I should pay attention to? and do I need the episilon parameter anymore?
Thank you very much:)

I'd definitely say no!
The problem is that the agent will only learn from random decisions and can never try if a learned action produces maybe even more reward. So everything he learns will be based on the starting frames.
Further, the agent will, in your case, never learn how to handle his size (if it grows like in snake) because he will never grow due to the bad random decisions.
Imagine a child that tries to ride a bike and you lift it off the bike as soon as it has ridden one meter. It will probably be able to ride one or even more meters straight but will never be able to do turns, etc.

Related

What if a robot breaks in the process of reinforcement learning

Let's say I try to make a jumping robot with RL. But RL requires trial and error, and of course, my robot will fail in jumping a lot of times at the beginning.
How do developers teach a robot that could break while learning?
What if a robot breaks in the process of reinforcement learning?
Then you have a broken robot.
How do people teach a robot that could break while learning with RL?
I would do it like this:
Make a simulation. There are physical simulations out there, so first make sure your RL agent acts reasonably there.
Have constraints: Maybe you don't want to let it jump directly. Let's first try to make it stand.
Loosening constraints. Once one task is solved, go for a more complex one / one with higher probability that hardware is damaged.
And, of course, add cables to the robot that catches it if it falls. I remember seeing that for robots from Boston Dynamics, but I can't find the videos right now.

Chess Engine - Confusion in engine types - Flash as3

I am not sure this kind of question has been asked before, and been answered, by as far as my search is concerned, I haven't got any answer yet.
First let me tell you my scenario.
I want to develop a chess game in Flash AS3. I have developed the interface. I have coded the movement of the pieces and movement rules of the pieces. (Please note: Only movement rules yet, not capture rules.)
Now the problem is, I need to implement the AI in chess for one player game. I am feeling helpless, because though I know each and every rules of the chess, but applying AI is not simple at all.
And my biggest confusion is: I have been searching, and all of my searches tell me about the chess engines. But I always got confused in two types of engines. One is for front end, and second is real engines. But none specifies (or I might not get it) which one is for which.
I need a API type of some thing, where when I can get searching of right pieces, and move according to the difficulty. Is there anything like that?
Please note: I want an open source and something to be used in Flash.
Thanks.
First of all http://nanochess.110mb.com/archive/toledo_javascript_chess_3.html here is the original project which implements a relatively simple AI (I think it's only 2 steps deep) in JavaScript. Since that was a contest project for minimal code, it is "obfuscated" somewhat by hand-made reduction of the source code. Here's someone was trying to restore the same code to a more or less readable source: https://github.com/bormand/nanochess .
I think that it might be a little bit too difficult to write it, given you have no background in AI... I mean, a good engine needs to calculate more then two steps ahead, but just to give you some numbers: the number of possible moves per step, given all pieces are on the board would be approximately 140 at max, the second step thus would be all the combination of these moves with all possible moves of the opponent and again this much combinations i.e. 140 * 140 * 140. Which means you would need a very good technique to discriminate the bad moves and only try to predict good moves.
As of today, there isn't a deterministic winning strategy for chess (in other words, it wasn't solved by computers, like some other table games), which means, it is a fairly complex game, but an AI which could play at a hobbyist level isn't all that difficult to come up with.
A recommended further reading: http://aima.cs.berkeley.edu/
A Chess Program these days comes in two parts:
The User Interface, which provides the chess board, moves view, clocks, etc.
The Chess Engine, which provides the ability to play the game of chess.
These two programs use a simple text protocol (UCI or XBoard) to communicate with the UI program running the chess engine as a child process and communicating over pipes.
This has several significant advantages:
You only need one UI program which can use any compliant chess engine.
Time to develop the chess engine is reduced as only a simple interface need be provided.
It also means that the developers get to do the stuff they are good at and don't necessarily have to be part of a team in order to get the other bit finished. Note that there are many more chess engines than chess UI's available today.
You are coming to the problem with several disadvantages:
As you are using Flash, you cannot use this two program approach (AFAIK Flash cannot use fork(). exec(), posix_spawn()). You will therefore need to provide all of the solution which you should at least attempt to make multi-threaded so the engine can work while the user is interacting with the UI.
You are using a language which is very slow compared to C++, which is what engines are generally developed in.
You have access to limited system resources, especially memory. You might be able to override this with some setting of the Flash runtime.
If you want your program to actually play chess then you need to solve the following problems:
Move Generator: Generates all legal moves in a position. Some engine implementations don't worry about the "legal" part and prune illegal moves some time later. However you still need to detect check, mate, stalemate conditions at some point.
Position Evaluation: Provide a score for a given position. If you cannot determine if one position is better for one side than another then you have no way of finding winning moves.
Move Tree and pruning: You need to store the move sequences you are evaluating and a way to prune (ignore) branches that don't interest you (normally because you have determined that they are weak). A chess move tree is vast given every possible reply to every possible move and pruning the tree is the way to manage this.
Transpotion table: There are many transpositions in chess (a position reached by moving the pieces in a different order). One method of avoiding the re-evaluation of the position you have already evaluated is to store the position score in a transposition table. In order to do that you need to come up with a hash key for the position, which is normally implemented using Zobrist hash.
The best sites to get more detailed information (I am not a chess engine author) would be:
TalkChess Forum
Chess Programming Wiki
Good luck and please keep us posted of your progress!

How to design and write a game efficiently?

I'm writing a very simple Java Game. Let me describe it briefly:
There are 4 players in a Map.
The map is a two-dimensional matrix with a value called "height"
The height between 2 nodes is the cost of that edge.
Use Dijkstra algorithm to help player navigate from a source to a destination.
Four players take turn to make a move. The total move is 8( left top, top, right top.... )
If they meets, fight for gold value, otherwise move to their target.
As they move, their strength decrease by the height difference between two nodes.
... etc
....
The problem that I'm encountering is that the source code is getting longer and complex day by day. And I think I'm using a wrong approach somehow, I feel so tired because of constantly changing the implementation. Here is my approach:
Write out all requirements.
Create all the object that I need with all getter and setter.
Create a static class to test the logic
Create unit test while putting the logic together
Add some more code, then change to code to fit the test
Write a big method that run, then break it down into smaller methods, then write unit test again.
If everything work fine, add more requirements, add more code
Then things are getting complicated because the more code I added, the complexity increases. No longer have time to write unit test because create a test case now requires too much work
Re-design, then change the implementation, go to step 1 again.
I'm come from a C++ background, and I'm only comfortable with writing 'static' libraries such as: stack, queue, link-listed, tree... Game is really a big challenging to me, especially I have to use Java. I understand the core programming is the same, so picking up Java was not really that bad. However, the time of looking up Java's API is not little. Further, the game logic is really hard to write. When this object moves, other object got affected..., so creating test for a method depends on many other methods...etc.
I really need an advice. Could anyone share some experience of how to write a game to me? I only have two weeks left for this assignment. I'm currently have 45 classes now, I feel so lost because the more I wrote the more it gets complex :( !
Best regards,
Chan Nguyen
First start thinking like a java programmer. Think as every thing in your game as an object, like the board, think about the properties and methods it has, its interfaces, how it interacts with the other objects.
If you need help getting started here is a great tutorial that guides you step by step to do a simple java game, this might put you in the right frame of mind to start programming your own. I strongly recommend you to follow the tutorial.http://www.cokeandcode.com/asteroidstutorial and to use the libraries that they used for developing the interfaz there.
I view game code architects with respect: games are complex systems with emergent properties at runtime, unusually intense interaction requirements (UI, controls) which makes a lot of OOP theory questionable value. It can be difficult to reuse game code. And, a lot of upfront planning work is wasted time.
Most game coders I know, beginning or veteran, succeed with a "just do it" iterative process. e.g.
1) write a minimal prototype. get a very basic system working, using the simplest, most obvious architecture you can think of. (my guy can run around the screen). 5 or 10 objects max.
2) add functionality (points, rules, traps, NPC behaviours, etc) and playtest, over and over. This hack on hack can makes for poorly structured code, but most coders can make it work.
3) rewrite. Programmers grit their teeth at some of the hacking they had to do in (2), and will want to throw it all out and rewrite. Resist this urge until the game is testable (as in, plyaers can sometimes enjoy it, somewhat), or a new feature would require rewriting. Then, rewrite pretty much EVERYTHING from scratch. This goes WAY faster than you'd expect, and results in solid, well-structured code.
Game coders do test, but comprehensive testing of ALL code is rare. two reasons: emergence and culture. Games have emergent properties at runtime ("yeah, but the points COULD go negative when the NPC is killed when ...."). Since games are usually for entertainment purposes, there is a culture of fast-and-loose testing. Games aren't as important as, say, missile control code.
I expect others with more coding experience answer this. (I have written a fair bit of code but I tend toward quick and dirty script type coding style - I know lots of coders who are way better than me.)

Order-issuing neural network?

I'm interested in writing certain software that uses machine learning, and performs certain actions based on external data.
However I've run into problem (that was always interesting to me) -
how is it possible to write machine learning software that issues orders or sequences of orders?
The problem is that as I understand it, neural network gets bunch on inputs, and "recalls" output based on results of previous trainings. Instantly (well, more or less). So I'm not sure how "issuing orders" could fit into that system, especially when actions performed by system affect the system with certain delay. I'm also a bit unsure how is it possible to train this thing.
Examples of such system:
1. First person shooter enemy controller. As I understand it, it is possible to implement neural network controller for the bot that will switch bot behavior strategies(well, assign priorities to them) based on some inputs (probably something like health, ammo, etc). But I don't see a way to make higher-order controller, that could issue sequence of commands like "go there, then turn left". Also, bot's actions will affect variables that control bot's behavior. I.e. shooting reduces ammo, falling from heights reduces health, etc.
2. Automated market trader. It is certainly possible to make system that will try to predict the next market price of something. However, I don't see how is it possible to make system that would issue order to buy something, watch the trend, then sell it back to gain profit/cover up losses.
3. Car driver. Again, (as I understand it) it is possible to make system that will maintain desired movement vector based on position/velocity/torque data and results of previous training. However I don't see a way to make such system (learn to) perform sequence of actions.
I.e. as I understood it, neural net is technically a matrix - you give it input, it produces output. But what about generating sequences of actions that could change environment program operates in?
If such tasks are not entirely suitable for neural networks, what else could be used?
P.S. I understand that the question isn't exactly clear, and I suspect that I'm missing some knowledge. So I'll appreciate some pointers (i.e. books/resources to read, etc).
You could try to connect the output neurons to controllers directly, e.g. moving forward, turning, or shooting in the ego shooter, or buying orders for the trader. However, I think that the best results are gained nowadays when you let the neural net solve one rather specific subproblem, and then let a "normal" program interpret its answer. For example, you could let the neural net construct a map overlay of "where do I want to be", which the bot then translates into movements. The neural network for the trader could produce a "how much do I want which paper", which the bot then translates into buying or selling orders.
The decision which subproblem should be solved by a neural network is a very central one for its design. The important thing is that good solutions can be taught to the neural network.
Edit: Expanding this in the examples: When the ego shooter bot gets shot, it should not have wanted to be there; when it gets to shoot someone else, it should have wanted to be there more. When the trader loses money from a paper, it should have wanted it less before; if it gains, it should have wanted it more. These things can be taught.
The problem you are describing is known as Reinforcement Learning. Reinforcement learning is essentially a machine learning algorithm (such as a neural network) coupled with a controller. It has been used for all of the applications you mention, even to drive real cars.

Modeling an RTS or how Blizzard was able to put together Starcraft 1 & 2?

Not sure if this question is related to software development, but I hope someone can at least point me in the right direction(no, not that direction..)
I am very curious as to how Blizzard achieve such a balance of strategic/tactic forces in their games? If you look at Starcraft 1 or now 2, each race have unique features that sort of counterpart other unique features of other races and all together create a pretty beautiful(to my mind at least) balance.
Is there some sort of area of mathematics that could help model these things? How do they do it basically?
I don't have full answer but here is what I know. Initially, when game is technically ready the balance is not ideal. When they started first public beta there were holes in balance that they patched very fast. They let players (testers) play as is and captured statistics of % of wins per race and tuned the parameters according to it. When the beta was at the end ratio was almost ideal: 33%/33%/33%.
I've no idea how Blizzard specifically did it - it might have just been through a lot of user testing.
However, the entire field of CS and statistics dedicated to these kinds of problems is simulation. Generally the idea is to construct a model of your system and then generate inputs according to some statistical distribution to try to understand the behaviour of that model. In the case of finding game balance, you would want to show that most sequences of game events led to some kind of equilibrium. This would probably involve some kind of stochastic modeling.
Linear algebra, maybe a little calculus, and a lot of testing.
There's no real mystery to this sort of thing, but people often just don't think about it in the rigorous terms you need to get a system that is fairly well-balanced.
Basically the designers know how quickly you can gather resources (both the best case and the average case), and they know how long it takes to build a unit, and they know roughly how powerful a unit is (eg. by reference to approximations such as damage per second). With this, you can ensure a unit's cost in resources makes sense. And from that, it's possible to compare resource gathering with unit cost to model the strength of a force growing over time. Similarly you can measure a force's capacity for damage over time, and compare the two. If there's a big disparity then they can tweak one or more of the variables to reduce it. Obviously there are many permutations of units but the interesting thing is that you only really need to understand the most powerful permutations - if a player picks a poor combination then it's ok if they lose. You don't want every approach to be equally good, as that would imply a boring game where your decisions are meaningless.
This is, of course, a simplification, but it helps get everything in roughly the right place and ensure that there are no units that are useless. Then testing can hammer down the rough edges and find most of the exploits that remain.
If you've tried SCII you've noticed that Blizzard records the relevant data for each game played on B.Net. They did the same in WC3, and presumably in SC1. With enough games stored, it is possible to get accurate results from statistical analysis. For example, the Protoss should win a third of all match-ups with similarly skilled opponents. If this is not the case, Blizzard can analyze the games where the Protoss won vs the games where they lost, and see what units made the difference. Then they can nerf those units (with a bit of in-house testing), and introduce the change at the next patch. Balancing is a difficult problem - a change that fixes problems in top-level games may break the balance in mid-level games.
Proper testing in a closed beta is impossible - you just can't accumulate enough data. This is why Blizzard do open betas. Even so the real test is the release of the game.