how can I generate previous state of conway's game of life algorithm? - reverse

suppose we have ith state of conway's game of life algorithm how can I generate a previous state of this algorithm
I have no idea about this problem

Conway's Game of Life, one of the most famous cellular automaton rules, is not reversible: for instance, it has many patterns that die out completely, so the configuration in which all cells are dead has many predecessors, and it also has Garden of Eden patterns with no predecessors. However, another rule called "Critters" by its inventors, Tommaso Toffoli and Norman Margolus, is reversible and has similar dynamic behavior to Life.
From https://en.wikipedia.org/wiki/Reversible_cellular_automaton

Related

How can we design rewards for an RL algorithm to incentivize a group metric?

I am using designing a reinforcement learning agent to guide individual cars within a bounded area of roads. The policy determines which route the car should take.
Each car can see the cars within 10 miles of it, their velocities, and the road graph of the whole bounded area. The policy of the RL-based agent must determine the actions of the cars in order to maximize the flow of traffic, lets say defined by reduced congestion.
How can we design rewards to incentivize each car to not act greedily and maximize just its own speed, but rather minimize congestion within the bounded area overall?
I tried writing a Q-learning based method for routing each vehicle, but this ended up compelling every car to greedily take the shortest route, producing a lot of congestion by crowding the cars together.
It's good to see more people working on cooperative MARL. Shameless plug for my research effort, feel free to reach out to discuss.
I think you need to take a step back for your question. You ask how to design the rewards so the agents will benefit the environment rather than themselves. Now, if you wanted, you could have just given each agent a reward based on the total welfare of the population. This will probably work, and you probably won't want that because it defeats the purpose of a multi-agent environment, right?
If you want the agents to be selfish but somehow converge to a cooperative solution, this is a very difficult problem (which is what I'm working on.)
If you're okay with a compromise, you could use intrinsic motivation, like in these papers:
Jaques 2018: Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Vinitsky 2021: A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings
Hughes 2018: Give agents a reward when there's low inequality in the population
What all of these papers have in common is that they add another component to the reward of each agent. That component is pro-social, like incentivizing the agent to increase its influence over the actions of other agents. Still it's a less extreme solution than just making the reward be social welfare directly.

Why introduce Markov property to reinforcement learning?

As a beginner of deep reinforcement learning, I am confused about why we should use Markov process in reinforcement learning, and what benefits it brings to reinforcement learning. In addition, Markov process requires that under the "known" condition, the "present" has nothing to do with the "future". Why do some deep reinforcement learning algorithms can use RNN and LSTM? Does this violate the Markov prcess's assumption?
The Markov property is used for the math to workout in the optimization process. Do keep in mind however that it is much more generally applicable than you might think it is. For example if in a certain board game you need to know the last three states of the game, this might seem as violating the Markov property; however, if you simply redefine your "state" to be the concatenation of the last three states, now you are back in a MDP.
This assumption says that the current state gives all the information needed about all aspects of the past agent-environment iteraction that makes difference for the future of the system. It is an important definition because you can define the dynamics of the process as p(s',r | s, a). In practice terms, you don't need to look and compute all the previous states of the system to determine the next possible states.

gorilla vs shark - Who would win - Resolving deadlocks in Utility theory game AI

Context: Utility theory Game AI
Question: How to resolve deadlocks ?
Scenario: RTS game - both sides have only one unit left in the game and no resources to build more.
Consider a Gorilla and a Shark that are the only two entities who exist in a closed world (simulation)
Both the Gorilla and the Shark are driven by Utility theory AI
Neither has any knowledge of the others capabilities.
Neither has any knowledge of who would win if a fight were to take place.
Neither has any knowledge of the location of the other.
From the point of view of the Gorilla (Could just as easily be the Shark):
In order to maximize utility it must fight with and defeat the shark.
In order to fight with the Shark the Gorilla must first scout for the Shark. however in doing so it might "run into" the Shark and end up in a fight.
If the Gorilla were to fight with the shark it might lose in which case this would be the opposite of maximizing utility.
Therefore in these circumstances a Utility theory AI ends up simply where the Gorilla and the Shark just avoid each other intentionally.
And this then becomes a deadlock. Why so is because avoiding the other is also a form of maximizing utility.
To break the deadlock requires making the game AI use less than optimal rules at random. For example "Random unplanned attack"
Essentially how do you model "outcome unknown" scenarios (or scenarios where the actions are non deterministic) into a utility theory AI. Or is introducing Random behaviors the only way?

Where to start with speech synthesis

You guys may be familiar with Google's TTS engine: here.
I have a basic understanding of how something like that is able to analyze the input and pick out different syllables/parts of speech, but where would I start if I wanted to create a "voice" for a TTS system?
That's a question that I spent nearly a semester in college learning the answer to, and a year (or more) of classes beforehand to learn the underlying signal processing required to understand the process. Whole classes are devoted to speech synthesis, and whole curriculums to signal processing.
One can think of the human vocal tract as a filter, and the glottis as an impulse generator—that is, speech is actually the result of an impulse train filtered by the vocal tract, mouth, and nasal cavity.
For every phoneme, the "filter" will be different, so you will need a library of phonemes to generate "filters" for. Theoretically, inverse filtering could be used on a library of phoneme sound clips to find "filter" coefficients. The Levinson-Durbin recursion is often used to find LPC coefficients.
A glottal pulse train must be created. A simple way to do this is to convolve a pulse train with a positive half-sine wave.
Finally, filter the glottal pulse train with the "filter" coefficients associated with the phoneme you wish to create.
But that's only for voiced speech. In order to generate unvoiced speech, a simple solution is to filter a random noise signal with "filter" coefficients associated with unvoiced speech phonemes.
One layer of abstraction above that, create a list of phonemes needed, and concatenate. Simple as pie!
UPDATE:
A friend pointed out Festival, a "black box" to input text and get speech out: http://festvox.org/festival/

What techniques exist for the software-driven locomotion of a bipedal robot?

I'm programming a software agent to control a robot player in a simulated game of soccer. Ultimately I hope to enter it in the RoboCup competition.
Amongst the various challenges involved in creating such an agent, the motion of it's body is one of the first I'm facing. The simulation I'm targeting uses a Nao robot body with 22 hinge to control. Six in each leg, four in each arm and two in the neck:
(source: sourceforge.net)
I have an interest in machine learning and believe there must be some techniques available to control this guy.
At any point in time, it is known:
The angle of all 22 hinges
The X,Y,Z output of an accelerometer located in the robot's chest
The X,Y,Z output of a gyroscope located in the robot's chest
The location of certain landmarks (corners, goals) via a camera in the robot's head
A vector for the force applied to the bottom of each foot, along with a vector giving the position of the force on the foot's sole
The types of tasks I'd like to achieve are:
Running in a straight line as fast as possible
Moving at a defined speed (that is, one function that handles fast and slow walking depending upon an additional input)
Walking backwards
Turning on the spot
Running along a simple curve
Stepping sideways
Jumping as high as possible and landing without falling over
Kicking a ball that's in front of your feet
Making 'subconscious' stabilising movements when subjected to unexpected forces (hit by ball or another player), ideally in tandem with one of the above
For each of these tasks I believe I could come up with a suitable fitness function, but not a set of training inputs with expected outputs. That is, any machine learning approach would need to offer unsupervised learning.
I've seen some examples in open-source projects of circular functions (sine waves) wired into each hinge's angle with differing amplitudes and phases. These seem to walk in straight lines ok, but they all look a bit clunky. It's not an approach that would work for all of the tasks I mention above though.
Some teams apparently use inverse kinematics, though I don't know much about that.
So, what approaches are there for robot biped locomotion/ambulation?
As an aside, I wrote and published a .NET library called TinMan that provides basic interaction with the soccer simulation server. It has a simple programming model for the sensors and actuators of the robot's 22 hinges.
You can read more about RoboCup's 3D Simulated Soccer League:
http://en.wikipedia.org/wiki/RoboCup_3D_Soccer_Simulation_League
http://simspark.sourceforge.net/wiki/index.php/Main_Page
http://code.google.com/p/tin-man/
There is a significant body of research literature on robot motion planning and robot locomotion.
General Robot Locomotion Control
For bipedal robots, there are at least two major approaches to robot design and control (whether the robot is simulated or physically real):
Zero Moment Point - a dynamics-based approach to locomotion stability and control.
Biologically-inspired locomotion - a control approach modeled after biological neural networks in mammals, insects, etc., that focuses on use of central pattern generators modified by other motor control programs/loops to control overall walking and maintain stability.
Motion Control for Bipedal Soccer Robot
There are really two aspects to handling the control issues for your simulated biped robot:
Basic walking and locomotion control
Task-oriented motion planning
The first part is just about handling the basic control issues for maintaining robot stability (assuming you are using some physics-based model with gravity), walking in a straight-line, turning, etc. The second part is focused on getting your robot to accomplish specific tasks as a soccer player, e.g., run toward the ball, kick the ball, block an opposing player, etc. It is probably easiest to solve these separately and link the second part as a higher-level controller that sends trajectory and goal directives to the first part.
There are a lot of relevant papers and books which could be suggested, but I've listed some potentially useful ones below that you may wish to include in whatever research you have already done.
Reading Suggestions
LaValle, Steven Michael (2006). Planning Algorithms, Cambridge University Press.
Raibert, Marc (1986). Legged Robots that Balance. MIT Press.
Vukobratovic, Miomir and Borovac, Branislav (2004). "Zero-Moment Point - Thirty Five Years of its Life", International Journal of Humanoid Robotics, Vol. 1, No. 1, pp 157–173.
Hirose, Masato and Takenaka, T (2001). "Development of the humanoid robot ASIMO", Honda R&D Technical Review, vol 13, no. 1.
Wu, QiDi and Liu, ChengJu and Zhang, JiaQi and Chen, QiJun (2009). "Survey of locomotion control of legged robots inspired by biological concept ", Science in China Series F: Information Sciences, vol 52, no. 10, pp 1715--1729, Springer.
Wahde, Mattias and Pettersson, Jimmy (2002) "A brief review of bipedal robotics research", Proceedings of the 8th Mechatronics Forum International Conference, pp 480-488.
Shan, J., Junshi, C. and Jiapin, C. (2000). "Design of central pattern generator for
humanoid robot walking based on multi-objective GA", In: Proc. of the IEEE/RSJ
International Conference on Intelligent Robots and Systems, pp. 1930–1935.
Chestnutt, J., Lau, M., Cheung, G., Kuffner, J., Hodgins, J., and Kanade, T. (2005). "Footstep planning for the Honda ASIMO humanoid", Proceedings of the 2005 IEEE International Conference on Robotics and Automation (ICRA 2005), pp 629-634.
I was working on a project not that dissimilar from this (making a robotic tuna) and one of the methods we were exploring was using a genetic algorithm to tune the performance of an artificial central pattern generator (in our case the pattern was a number of sine waves operating on each joint of the tail). It might be worth giving a shot, Genetic Algorithms are another one of those tools that can be incredibly powerful, if you are careful about selecting a fitness function.
Here's a great paper from 1999 by Peter Nordin and Mats G. Nordahl that outlines an evolutionary approach to controlling a humanoid robot, based on their experience building the ELVIS robot:
An Evolutionary Architecture for a Humanoid Robot
I've been thinking about this for quite some time now and I realized that you need at least two intelligent "agents" to make this work properly. The basic idea is that you have two types intelligent activity here:
Subconscious Motor Control (SMC).
Conscious Decision Making (CDM).
Training for the SMC could be done on-line... if you really think about it: defining success within motor control is basically done when you provide a signal to your robot, it evaluates that signal and either accepts it or rejects it. If your robot accepts a signal and it results in a "failure", then your robot goes "offline" and it can't accept any more signals. Defining "failure" and "offline" could be tricky, but I was thinking that it would be a failure if, for example, a sensor on the robot indicates that the robot is immobile (laying on the ground).
So your fitness function for the SMC might be something of the sort: numAcceptedSignals/numGivenSignals + numFailure
The CDM is another AI agent that generates signals and the fitness function for it could be: (numSignalsAccepted/numSignalsGenerated)/(numWinGoals/numLossGoals)
So what you do is you run the CDM and all the output that comes out of it goes to the SMC... at the end of a game you run your fitness functions. Alternately you can combine the SMC and the CDM into a single agent and you can make a composite fitness function based on the other two fitness functions. I don't know how else you could do it...
Finally, you have to determine what constitutes a learning session: is it half a game, full game, just a few moves, etc. If a game lasts 1 minute and you have a total of 8 players on the field, then the process of training could be VERY slow!
Update
Here is a quick reference to a paper that used genetic programming to create "softbots" that play soccer: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.136&rep=rep1&type=pdf
With regards to your comments: I was thinking that for the subconscious motor control (SMC), the signals would come from the conscious decision maker (CDM). This way you're evolving your SMC agent to properly handle the CDM agent's commands (signals). You want to maximize the up-time of the SMC agent regardless of what the CDM agent says.
The SMC agent receives an input, for example a vector force on a joint, and it then runs it through its processing unit to determine if it should execute that input or if it should reject it. The SMC should only execute inputs that it doesn't "think" it will recover from and it should reject inputs that it "thinks" would lead to a "catastrophic failure".
Now the SMC agent has an output: accept or reject a signal (1 or 0). The CDM can use that signal for its own training... the CDM wants to maximize the number of signals that the SMC accepts and it also wants to satisfy a goal: a high score for its own team and a low score for the opposing team. So the CDM has its own processing unit that is being evolved to satisfy both of those needs. Your reference provided a 3-layer design, while mine is only a 2-layer... I think mine was a right step in towards the 3-layer design.
One more thing to note here: is falling really a "catastrophic failure"? What if your robot falls, but the CDM makes it stand up again? I think that would be a valid behavior, so you shouldn't penalize the robot for falling... perhaps a better thing to do is penalize it for the amount of time it takes in order to perform a goal (not necessarily a soccer goal).
There is this tutorial on humanoid locomotion control that describes the software stack used on the HRP-4 humanoid (which can walk or climb stairs). It consists mainly of:
Linear inverted pendulum: a simplified model for balancing. It involves only the center of mass (COM) and ZMP already mentioned in other answers.
Trajectory optimization: the robot computes what it wants to do, ideally, for the next 2 seconds or so. It keeps recomputing this trajectory as it moves, which is known as model predictive control.
Balance control: the last stage that corrects the robot's posture based on sensor measurements and the desired trajectory.
Follow links to the academic papers and source code to learn more.