most readable programming language to simulate 10,000 chutes and ladders game plays? - language-agnostic

I'm wondering what language would be most suitable to simulate the game Chutes and Ladders (Snakes and Ladders in some countries). I'm looking to collect basic stats, like average and standard deviation of game length (in turns), probability of winning based on turn order (who plays first, second, etc.), and anything else of interest you can think of. Specifically, I'm looking for the implementation that is most readable, maintainable, and modifiable. It also needs to be very brief.
If you're a grown-up and don't spend much time around young kids then you probably don't remember the game that well. I'll remind you:
There are 100 squares on the board.
Each player takes turn spinning a random number from 1-6 (or throwing a dice).
The player then advances that many squares.
Some squares are at the base of a ladder; landing on one of these squares means the player gets to climb the ladder, advancing the player's position to a predetermined square.
Some squares are at the top of a slide (chute or snake); landing on one of these squares means the player must slide down, moving the player's position back to a predetermined square.
Whichever player gets to position 100 first is the winner.

This is a bit rough, but it should work:
class Board
attr_accessor :winner
def initialize(players, &blk)
#chutes, #ladders = {}, {}
#players = players
#move = 0
#player_locations = Hash.new(0)
self.instance_eval(&blk)
end
def chute(location)
#chutes[location[:from]] = location[:to]
end
def ladder(location)
#ladders[location[:from]] = location[:to]
end
def spin
player = #move % #players
die = rand(6) + 1
location = (#player_locations[player] += die)
if endpoint = #chutes[location] || endpoint = #ladders[location]
#player_locations[player] = endpoint
end
if #player_locations[player] >= 100
#winner = player
end
#move += 1
end
end
num_players = 4
board = Board.new num_players, do
ladder :from => 4, :to => 14
ladder :from => 9, :to => 31
# etc.
chute :from => 16, :to => 6
# etc.
end
until board.winner
board.spin
end
puts "Player #{board.winner} is the winner!"

You should check out something along the lines of Ruby or Python. Both are basically executable psuedocode.
You might be able to get a shorter, more brilliant program with Haskell, but I would imagine Ruby or Python would probably be actually understandable.

For many statistics, you don't need to simulate. Using Markov Chains, you can reduce many problems to matrix operations on a 100x100-matrix, which only take about 1 millisecond to compute.

I'm going to disagree with some of the earlier posters, and say that an object oriented approach is the wrong thing to do here, as it makes things more complicated.
All you need is to track the position of each player, and a vector to represent the board. If the board position is empty of a chute or ladder, it is 0. If it contains a ladder, the board contains a positive number that indicates how many positions to move forward. If it contains a chute, it contains a negative number to move you back. Just track the number of turns and positions of each player.
The actual simulation with this method is quite simple, and you could do it in nearly any programming language. I would suggest R or python, but only because those are the ones I use most these days.
I don't have a copy of chutes and ladders, so I made up a small board. You'll have to put in the right board:
#!/usr/bin/python
import random, numpy
board = [0, 0, 0, 3, 0, -3, 0, 1, 0, 0]
numplayers = 2
numruns = 100
def simgame(numplayers, board):
winner = -1
winpos = len(board)
pos = [0] * numplayers
turns = 0
while max(pos) < winpos:
turns += 1
for i in range(0, numplayers):
pos[i] += random.randint(1,6)
if pos[i] < winpos:
pos[i] += board[pos[i]]
if pos[i] >= winpos and winner == -1:
winner = i
return (turns, winner)
# simulate games, then extract turns and winners
games = [simgame(numplayers, board) for x in range(numruns)]
turns = [n for (n, w) in games]
winner = [w for (t, w) in games]
pwins = [len([p for p in winner if p == i]) for i in range(numplayers)]
print "runs:", numruns
print "mean(turns):", numpy.mean(turns)
print "sd(turns):", numpy.std(turns)
for i in range(numplayers):
print "Player", (i+1), "won with proportion:", (float(pwins[i])/numruns)

F# isn't too ugly as well, its hard to beat a functional language for conciseness:
#light
open System
let snakes_and_ladders = dict [(1,30);(2,5);(20,10);(50,11)]
let roll_dice(sides) =
Random().Next(sides) + 1
let turn(starting_position) =
let new_pos = starting_position + roll_dice(6)
let found, after_snake_or_ladder = snakes_and_ladders.TryGetValue(new_pos)
if found then after_snake_or_ladder else new_pos
let mutable player_positions = [0;0]
while List.max player_positions < 100 do
player_positions <- List.map turn player_positions
if player_positions.Head > 100 then printfn "Player 1 wins" else printf "Player 2 wins"

I remember about 4 years ago being it a Top Coders competition where the question was what is the probability that player 1 win at snakes and ladders.
There was a really good write-up how to do the puzzle posted after the match. Can't find it now but this write-up is quite good
C/C++ seemed enough to solve the problem well.

Pick any object oriented language, they were invented for simulation.
Since you want it to be brief (why?), pick a dynamically typed language such as Smalltalk, Ruby or Python.

Related

Is learning and cumulative reward a good metrics to evaluate a RL model?

i am new to reinforcement learning.
I have a problem here that i am using DQN on. I have plotted a cumulative reward curve while learning and taking actions. After 100 episodes and it shows a lot of fluctuations that does not show me whether it has learnt anything.
However, instead of using learning and cumulative reward, I put the model through the whole simulation without learning method after each episode and it shows me that the model is actually learning well. This extended the program runtime by quite a bit.
In addition, i have to extract the best model along the way because the final model seems to be performing badly at times.
Any advice or explanation for this?
Try to use the average mean return it's usually a good metric to know if the agent is improving or not.
If you're using tf_agent you can do something like this :
...
checkpoint_dir = os.path.join('./', 'checkpoint')
train_checkpointer = common.Checkpointer(
ckpt_dir=checkpoint_dir,
max_to_keep=1,
agent=agent,
policy=agent.policy,
replay_buffer=replay_buffer,
global_step=train_step
)
policy_dir = os.path.join('./', 'policy')
tf_policy_saver = policy_saver.PolicySaver(agent.policy)
def train_agent(n_iterations):
best_AverageReturn = 0
time_step = None
policy_state = agent.collect_policy.get_initial_state(tf_env.batch_size)
iterator = iter(dataset)
for iteration in range(n_iterations):
time_step, policy_state = collect_driver.run(time_step, policy_state)
trajectories, buffer_info = next(iterator)
train_loss = agent.train(trajectories)
if iteration % 10 == 0:
print("\r{} loss:{:.5f}".format(iteration, train_loss.loss.numpy()), end="")
if iteration % 1000 == 0 and averageReturnMetric.result() > best_AverageReturn:
best_AverageReturn = averageReturnMetric.result()
train_checkpointer.save(train_step)
tf_policy_saver.save(policy_dir)
After 1000 steps the train function evaluates the average return and create a checkpoint if there are any improvements

Explanation behind actor-critic algorithm in pytorch example?

Pytorch provides a good example of using actor-critic to play Cartpole in the OpenAI gym environment.
I'm confused about several of their equations in the code snippet found at https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py#L67-L79:
saved_actions = model.saved_actions
value_loss = 0
rewards = []
for r in model.rewards[::-1]:
R = r + args.gamma * R
rewards.insert(0, R)
rewards = torch.Tensor(rewards)
rewards = (rewards - rewards.mean()) / (rewards.std() + np.finfo(np.float32).eps)
for (action, value), r in zip(saved_actions, rewards):
action.reinforce(r - value.data.squeeze())
value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))
optimizer.zero_grad()
final_nodes = [value_loss] + list(map(lambda p: p.action, saved_actions))
gradients = [torch.ones(1)] + [None] * len(saved_actions)
autograd.backward(final_nodes, gradients)
optimizer.step()
What do r and value mean in this case? Why do they run REINFORCE on the action space with the reward equal to r - value? And why do they try to set the value so that it matches r?
Thanks for your help!
First the rewards a collected for a time, along with the state:action that resulted in the reward
Then r - value is the difference between the expected reward and actual
That difference is used to adjust the expected value of that action from that state
So if in state "middle", the expected reward for action "jump" was 10 and the actual reward was only 2, then the AI was off by -8 (2-10). Reinforce means "adjust expectations". So if we adjust them by half, we'll new expected reward is 10-(8 *.5), or 6. meaning the AI really thought it would get 10 for that, but now it's less confident and thinks 6 is a better guess. So if the AI is not off by much, 10 - ( 2 *.5) = 9, it will adjust by a smaller amount.

how to see/extract model flows evolution when solve a model in a Scilab function?

I wrote the Lotka Voterra model (prey-predator) as a Scilab function and solved it with ODE. My problem is that I want to see the flows evolution. I found a solution by including flows in the resolution (in the script below only for one) but it’s really “heavy”. Anyone has a better solution?
//parameters
x=[0.04,0.005,0.3,0.2]
n = x(1);//birth rate of prey
c = x(2);//capture rate
e = x(3);//energy from capture
m = x(4);//death rate or predator
Tmax=100; // maximum time of simulation
dt=0.01; // step of time
t=0:dt:Tmax; //simulation time definition
Ci = [20,30,0]'; //initial conditions (including for 1 flow)
//Lotka Volterra model
function [dy]=LV(t, y, n, c, e, m)
GrowthP = n * y(1)
IngestC = c * y(1) * y(2)
MortC = m * y(2)
dy(1) = GrowthP - IngestC
dy(2) = IngestC * e - MortC
dy(3) = IngestC //here one flow in ode
endfunction
//resolution
sol=ode(Ci,0,t,LV)
//dataframe creation to stock data
soldef=zeros(3,10001);
//for the line 3 and form 2 to the last data it take the value,
//remove the one of the previous time step and store it in soldef
for i = 2:length(sol(3,:))
soldef(3,i)=sol(3,i)-sol(3,i-1);
end
//complete the dataframe with the stock
soldef(1:2,:)=sol(1:2,:)
Thanks for the interest and the time you give to my problem (and sorry for my English)
If you want animated display of your results, you can use the code below to plot more and more data on a graph:
[row,col]=size(soldef);
scf(0);
clf(0);
plot2d(soldef(:,1:2)',rect=[0,0,col,max(soldef)]); //plot first 2 points
en=gce(); //handle of the graphic entities
for i=3:col //plot more and more points
en.children(1).data=[1:i;soldef(3,1:i)]'; //update data matrices
en.children(2).data=[1:i;soldef(2,1:i)]';
en.children(3).data=[1:i;soldef(1,1:i)]';
//you can save the frames here...
xpause(1e4); //wait some time if amination is too fast
end
If you save all the graphs (with e.g.xs2gif()), with an external application (google for "gif maker") you can even make a .gif animation of your results to embedd in a webpage or presentation.

How to compute Fourier coefficients with MATLAB

I'm trying to compute the Fourier coefficients for a waveform using MATLAB. The coefficients can be computed using the following formulas:
T is chosen to be 1 which gives omega = 2pi.
However I'm having issues performing the integrals. The functions are are triangle wave (Which can be generated using sawtooth(t,0.5) if I'm not mistaking) as well as a square wave.
I've tried with the following code (For the triangle wave):
function [ a0,am,bm ] = test( numTerms )
b_m = zeros(1,numTerms);
w=2*pi;
for i = 1:numTerms
f1 = #(t) sawtooth(t,0.5).*cos(i*w*t);
f2 = #(t) sawtooth(t,0.5).*sin(i*w*t);
am(i) = 2*quad(f1,0,1);
bm(i) = 2*quad(f2,0,1);
end
end
However it's not getting anywhere near the values I need. The b_m coefficients are given for a
triangle wave and are supposed to be 1/m^2 and -1/m^2 when m is odd alternating beginning with the positive term.
The major issue for me is that I don't quite understand how integrals work in MATLAB and I'm not sure whether or not the approach I've chosen works.
Edit:
To clairify, this is the form that I'm looking to write the function on when the coefficients have been determined:
Here's an attempt using fft:
function [ a0,am,bm ] = test( numTerms )
T=2*pi;
w=1;
t = [0:0.1:2];
f = fft(sawtooth(t,0.5));
am = real(f);
bm = imag(f);
func = num2str(f(1));
for i = 1:numTerms
func = strcat(func,'+',num2str(am(i)),'*cos(',num2str(i*w),'*t)','+',num2str(bm(i)),'*sin(',num2str(i*w),'*t)');
end
y = inline(func);
plot(t,y(t));
end
Looks to me that your problem is what sawtooth returns the mathworks documentation says that:
sawtooth(t,width) generates a modified triangle wave where width, a scalar parameter between 0 and 1, determines the point between 0 and 2π at which the maximum occurs. The function increases from -1 to 1 on the interval 0 to 2πwidth, then decreases linearly from 1 to -1 on the interval 2πwidth to 2π. Thus a parameter of 0.5 specifies a standard triangle wave, symmetric about time instant π with peak-to-peak amplitude of 1. sawtooth(t,1) is equivalent to sawtooth(t).
So I'm guessing that's part of your problem.
After you responded I looked into it some more. Looks to me like it's the quad function; not very accurate! I recast the problem like this:
function [ a0,am,bm ] = sotest( t, numTerms )
bm = zeros(1,numTerms);
am = zeros(1,numTerms);
% 2L = 1
L = 0.5;
for ii = 1:numTerms
am(ii) = (1/L)*quadl(#(x) aCos(x,ii,L),0,2*L);
bm(ii) = (1/L)*quadl(#(x) aSin(x,ii,L),0,2*L);
end
ii = 0;
a0 = (1/L)*trapz( t, t.*cos((ii*pi*t)/L) );
% now let's test it
y = ones(size(t))*(a0/2);
for ii=1:numTerms
y = y + am(ii)*cos(ii*2*pi*t);
y = y + bm(ii)*sin(ii*2*pi*t);
end
figure; plot( t, y);
end
function a = aCos(t,n,L)
a = t.*cos((n*pi*t)/L);
end
function b = aSin(t,n,L)
b = t.*sin((n*pi*t)/L);
end
And then I called it like:
[ a0,am,bm ] = sotest( t, 100 );
and I got:
Sweetness!!!
All I really changed was from quad to quadl. I figured that out by using trapz which worked great until the time vector I was using didn't have enough resolution, which led me to believe it was a numerical issue rather than something fundamental. Hope this helps!
To troubleshoot your code I would plot the functions you are using and investigate, how the quad function samples them. You might be undersampling them, so make sure your minimum step size is smaller than the period of the function by at least factor 10.
I would suggest using the FFTs that are built-in to Matlab. Not only is the FFT the most efficient method to compute a spectrum (it is n*log(n) dependent on the length n of the array, whereas the integral in n^2 dependent), it will also give you automatically the frequency points that are supported by your (equally spaced) time data. If you compute the integral yourself (might be needed if datapoints are not equally spaced), you might calculate frequency data that are not resolved (closer spacing than 1/over the spacing in time, i.e. beyond the 'Fourier limit').

Maximize function with fminsearch

Within my daily work, I have got to maximize a particular function making use of fminsearch; the code is:
clc
clear all
close all
f = #(x,c,k) -(x(2)/c)^3*(((exp(-(x(1)/c)^k)-exp(-(x(2)/c)^k))/((x(2)/c)^k-(x(1)/c)^k))-exp(-(x(3)/c)^k))^2;
c = 10.1;
k = 2.3;
X = fminsearch(#(x) f(x,c,k),[4,10,20]);
It works fine, as I expect, but not the issue is coming up: I need to bound x within certain limits, as:
4 < x(1) < 5
10 < x(2) < 15
20 < x(3) < 30
To achieve the proper results, I should use the optimization toolbox, that I unfortunately cannot hand.
Is there any way to get the same analysis by making use of only fminsearch?
Well, not using fminsearch directly, but if you are willing to download fminsearchbnd from the file exchange, then yes. fminsearchbnd does a bound constrained minimization of a general objective function, as an overlay on fminsearch. It calls fminsearch for you, applying bounds to the problem.
Essentially the idea is to transform your problem for you, in a way that your objective function sees as if it is solving a constrained problem. It is totally transparent. You call fminsearchbnd with a function, a starting point in the parameter space, and a set of lower and upper bounds.
For example, minimizing the rosenbrock function returns a minimum at [1,1] by fminsearch. But if we apply purely lower bounds on the problem of 2 for each variable, then fminsearchbnd finds the bound constrained solution at [2,4].
rosen = #(x) (1-x(1)).^2 + 105*(x(2)-x(1).^2).^2;
fminsearch(rosen,[3 3]) % unconstrained
ans =
1.0000 1.0000
fminsearchbnd(rosen,[3 3],[2 2],[]) % constrained
ans =
2.0000 4.0000
If you have no constraints on a variable, then supply -inf or inf as the corresponding bound.
fminsearchbnd(rosen,[3 3],[-inf 2],[])
ans =
1.4137 2
Andrey has the right idea, and the smoother way of providing a penalty isn't hard: just add the distance to the equation.
To keep using the anonymous function:
f = #(x,c,k, Xmin, Xmax) -(x(2)/c)^3*(((exp(-(x(1)/c)^k)-exp(-(x(2)/c)^k))/((x(2)/c)^k-(x(1)/c)^k))-exp(-(x(3)/c)^k))^2 ...
+ (x< Xmin)*(Xmin' - x' + 10000) + (x>Xmax)*(x' - Xmax' + 10000) ;
The most naive way to bound x, would be giving a huge penalty for any x that is not in the range.
For example:
function res = f(x,c,k)
if x(1)>5 || x(1)<4
penalty = 1000000000000;
else
penalty = 0;
end
res = penalty - (x(2)/c)^3*(((exp(-(x(1)/c)^k)-exp(-(x(2)/c)^k))/((x(2)/c)^k-(x(1)/c)^k))-exp(-(x(3)/c)^k))^2;
end
You can improve this approach, by giving the penalty in a smoother way.