Value iteration example maybe wrong? - reinforcement-learning

Value iteration example maybe wrong? - reinforcement-learning

In this link, the agent moves stochastically: 70% in the direction decided and 10% in any of the other 3 directions. If it goes out of the map, it incurs a -1 payoff and stays in the same cell.
Now the reason, I think something is wrong is that if you look at the upper left corner, in V1, we can either choose right or down, say we choose down (say the discount factor is 0.9):
V1(S(0, 0)) = 0.7 * (0 + 0.9 * 0) + 0.1 * (0 + 0.9 * 0) + 0.1 * (-1 + 0.9 * 0) + 0.1 * (-1 + 0.9 * 0).
On the right-hand side, the order of the terms: going down, going right, going left, going up.
Notice that although the agent chooses to go down, the other terms represent the stochasticity of the output. Does that make any sense?
The other question, how does V1(S(1, 1)) 9.8? Shouldn't it be a combination of the cells nearby or am I missing something?
Thanks!

From Reddit, The 3x3 grid that is shown is just a smaller part of the larger grid displayed in example 9.26: So the top left corner in the 3x3 grid is actually not surrounded by walls. Example 9.26 also explains what happens when the agent reaches the +10 tile.

Related

PCA in 2D calculate center point in original data

I'm trying to create a bounding box around a given dataset.
My Idea therefore was to use a PCA. I read that it won't always find optimal solutions but this doesn't matter.
What I've done so far is that I calculate the covariance-matrix and use it to calculate a SVD of this matrix.
Lets say we have a sample input like
[40, 20], [-40, -20],[40, -20],[-40, 20],[30, 30]
The covariance matrix will become
[1780.0, 180.0] [180.0, 580.0]
With the SVD I get the rotation matrix U:
[0.99, 0.15]
[0.15, -0.99]
and the diagonal matrix D:
[1806.41, 0]
[0, 553.58]
With my eigenvectors I'm able to calculate the slope of the lines representing the box.
I now need to get the center of the PCA in the original space not in the 0-centered space.
And I also need to find out the length of those to vectors.
Does anyone has an idea how to get them?

Interesting question.Just some thoughts.
Is the centre you are referring to the mean of the data?
Think it this way, if we can project back (0,0) to the original space, it's the mean.
To find the length, assuming you are trying to include every point in the box, you can project every point in each principle component direction and record the largest and smallest coordinates. The difference will be the length.
By the way, I am under the impression that PCA on correlation matrix is usually the more appropriate choice and I think that applies to your question too.

I found a solution.
The idea was to use the two eigenvectors to calculte the maximum distance of all point to it.
The maximum distance will than be half the length of the rectangles width and height. As shown in the picture below
To position the rectangle I calculate the 4 points by
p1.x = max1 * eigenvector1(0) + max2 * eigenvector1(1)
p1.y = max1 * eigenvector2(0) + max2 * eigenvector2(1)
for all points.
Than I just had to transform the vertices and all datapoints by meanX and meanY and the rectangle enclosing the original dataset.

The problem in the solution above was that using just max was not the best idea, because it will always just be minimal in one direction of the eigenvectors.
By using min and max I'm now able to create minimal enclosing boxes in both directions of the principal components.
To calculate the points I used the code below, where minDistX is the absolute value of the minimum distance:
p1.setX(minDist2 * U[0][0] + maxDist1 * U[0][1]);
p1.setY(minDist2 * U[1][0] + maxDist1 * U[1][1]);
p2.setX(minDist2 * U[0][0] - minDist1 * U[0][1]);
p2.setY(minDist2 * U[1][0] - minDist1 * U[1][1]);
p3.setX(-(maxDist2 * U[0][0] + minDist1 * U[0][1]));
p3.setY(-(maxDist2 * U[1][0] + minDist1 * U[1][1]));
p4.setX(-(maxDist2 * U[0][0] - maxDist1 * U[0][1]));
p4.setY(-(maxDist2 * U[1][0] - maxDist1 * U[1][1]));

calculating the point of acceleration

I've been struggling to calculate the accelerator. I've spend a whole day in searching, trial & error but all in vain. I've one horizontal line on the stage (AS3) of let say 200 width. Center-point of that line is on 60 (if it was 100, I would have surely done it by just calculating the percentage). Now I need to know the width of given percentage. For example, total width of 60% or where will 30% (or any other percentage) start from?
What I know is the total width, and the center-point (either in percentage or in width).
Your help will be highly appreciated. In case if there is any formula, please give me details, don't just mention a/b/c as I'd never been a student of physics :(
Edit:
I don't have 10 reputations, so I can't post image directly here. Please click the following link to see the image.
Link: http://oi62.tinypic.com/11sk183.jpg
Edit:
Here is what I want exactly: I want to travel n% from any point (A/B/C/D) to its relative point (A->B/A->D ...) (Link)
http://i59.tinypic.com/2wp2lbl.jpg

If I understand correctly, you want a non-linear scale, so that pixel 1 on the line is 0%, pixel 100 on the line is 60% and pixel 200 is 100%?
If x=pixelpos/200 is the relative position on the line, one easy variation of the linear scale y=x*100% is y=(x+a*x*(1-x))*100%.
For x=0.5 the value is y=0.5+a*0.25, so for that to be 0.6=60% one needs a=0.4.
To get in the reverse direction the x for y=0.3=30%, one needs to solve a quadratic equation y=x*(1+a*(1-x)) or a*x^2-(1+a)*x+y=0. With the general solution formula, this gives
x = (1+a)/(2*a)-sqrt((1+a)^2-4*a*y)/(2*a)
= (2*y) / ( (1+a) + sqrt((1+a)^2-4*a*y) )
= (2*y) / ( (1+a) + sqrt((1-a)^2+4*a*(1-y)) )
and with a=0.4 and y=0.3
x = 0.6/( 1.4 + sqrt(1.98-0.48) )
approx 0.6/2.6=3/13=231/1001 approx 0.23
corresponding to pixel 46.
This will only work for a between -1 and 1, since for other values the slope at x=0 or x=1 will not be positive.
Another simple formula uses hyperbola instead of parabola,
y=a*x/(1+(a-1)*x)
with the inversion by
y+(a-1)*x*y = a*x <=> y = (a-(a-1)*y)*x
x = (y/a)/(1+(1/a-1)*y)
and
a = (y*(1-x))/(x*(1-y))
here there is no problem with monotonicity as long as there is no pole for x in [0,1], which is guaranteed for a>0.

Shadow that appears as ball gets closer to ground

I have developed a simple basketball game. The ball can be thrown and bounces off the ground, walls, rim, and backboard. I would like to have a ball shadow appear when the ball gets close to the ground. I've done my research and am coming up with nothing. So far I have had a limited amount of success with the following, running in the update loop of course:
shadow.alpha = _activeBall.y/600;
shadow.x = _activeBall.x;
600 is the floor, the maximum y value the ball can fall. The code above gets me close, but the shadow is always present, even when I am high enough in the air that a shadow should not be seen. I tried something like this:
if( _activeBall.y > 450 ) shadow.alpha = _activeBall.y/600;
shadow.x = _activeBall.x;
but that pops the shadow in to abruptly. It would also be ideal if the scale of the shadow decreased as the ball moved away from the floor. I am stumped with the math with this one and was hoping someone here can recommend an approach for this. Come on math gurus! Whatcha got?

Ok, so you need to learn the skill of normalizing / rescaling. It's not tough, as long as you know what you want.
It looks like you want alpha to be 0 when _activeBall.y < 450, then go from something less than 450/600 to eventually 600/600. Since I don't know what you want your alpha to start at (0 or 0.25 seems logical), I'll just call it alpha0. The good news for algebra - it lets you use a variable. :-)
So, the first easy part:
if( _activeBall.y < 450 ) { shadow.alpha = 0; }
// I ALWAYS use {} in any control flow statement.
So, normalize the range you want - turn it into a number from 0 to 1. Do this by subtracting the initial value and multiplying by the total range.
var normalized = (_activeBall.y - 450) / (600 - 450);
Note the (), otherwise you'd calculate _activeBall.y - (450/150).
Now that you have the normalized value, apply it to your scale by doing the opposite function with your new scale - multiply it times the total range and add the initial value:
shadow.alpha = normalized * (1 - alpha0) + alpha0;
Incidentally, you can do similarly with the scale of the shadow, so that when it's higher up, it's smaller. The normalized value will still be the same, and you can just set the scaleX and scaleY with that value.

How can I better pack rectangles tangent to a sphere for a 3d gallery?

I am creating a 3D sphere gallery with ActionScript 3 and the Flash 10 3D (2.5D) APIs. I have found a method that works but is not ideal. I would like to see if there is a better method.
My algorithm goes like this:
Let n = the number of images
h = the height of each image
w = the width of each image
Approximate the radius of the circle by assuming (incorrectly) that the surface area of the images is equal to the surface area of the sphere we want to create.To calculate the radius solve for r in nwh = 4πr2. This is the part that needs to be improved.
Calculate the angle between rows. rowAngle = 2atan(h / 2 / r).
Calculate the number of rows.rows = floor(π / rowAngle).
Because step one is an approximation, the number of rows will not fit perfectly, so for presentation add padding rowAngle.rowAngle += (π - rowAngle * rows) / rows.
For each i in rows:
Calculate the radius of the circle of latitude for the row.latitudeRadius = radius * cos(π / 2 - rowAngle * i.
Calculate the angle between columns.columnAngle = atan(w / 2 / latitudeRadius) * 2.
Calculate the number of colums.columns = floor(2 * π / columnAngle)
Because step one is an approximation, the number of columns will not fit perfectly, so for presentation add padding to columnAngle.columnAngle += (2 * π - columnAngle * column) / column.
For each j in columns, translate -radius along the Z axis, rotate π / 2 + rowAngle * i around the X axis, and rotate columnAngle * j around the Y axis.
To see this in action, click here. alternate link. Notice that with the default settings, the number of items actually in the sphere are less by 13. I believe is the error introduced by my approximation in the first step.
I am not able to figure out a method for determining what the exact radius of such a sphere should be. I'm hoping to learn either a better method, the correct method, or that what I am trying to do is hard or very hard (in which case I will be happy with what I have).

I would divide this problem into two connected problems.
Given a radius, how do you pack things on to the sphere?
Given a number of things, how do you find the right radius?
If you have a solution to the first problem, the second is easy to solve. Here it is in pseudo-code.
lowerRadius = somethingTooSmall
fittedItems = itemsForRadius(lowerRadius)
while fittedItems < wantedItems:
lowerRadius *= 2
fittedItems = itemsForRadius(lowerRadius)
upperRadius = 2 * lowerRadius
while threshold < upperRadius - lowerRadius:
middleRadius = (upperRadius + lowerRadius)/2
if itemsForRadius(middleRadius) < wantedItems:
lowerRadius = middleRadius
else:
upperRadius = middleRadius
This will find the smallest radius that will pack the desired number of things with your packing algorithm. If you wish you could start with a better starting point - your current estimate is pretty close. But I don't think that an analytic formula will do it.
Now let's turn to the first problem. You have a very reasonable approach. It does have one serious bug though. The bug is that your columnAngle should not be calculated for the middle of your row. What you need to do is figure out the latitude which your items are in that is closest to the pole, and use that for the calculation. This is why when you try to fit 10 items you find a packing that causes the corners to overlap.
If you want a denser packing, you can try squishing rows towards the equator. This will result in sometimes having room for more items in a row so you'll get more things in a smaller sphere. But visually it may not look as nice. Play with it, and decide whether you like the result.
BTW I like the idea. It looks nice.

In the case of squares, it seems to be an approximate formula for knowing the relationship between the radius, the square's side and the number of squares embedded.
Following this, the number of squares is:
Floor[4 Pi/Integrate[(x^2 + y^2 + r^2)^(-3/2), {x, -a/2, a/2}, {y, -a/2, a/2}]]
or
Floor[(Pi r)/ArcCot[(2 Sqrt[2] r Sqrt[a^2+2 r^2])/a^2]]
where
r = Radius
a = Square side
If you plot for r=1, as a function of a:
Where you can see the case a=2 is the boundary for n=6, meaning a cube:
Still working to see if it can be extended to the case of a generic rectangle.
Edit
For rectangles, the corresponding formula is:
Floor[4 Pi/Integrate[(x^2 + y^2 + r^2)^(-3/2), {x, -a/2, a/2}, {y, -b/2, b/2}]]
which gives:
Floor[(2 Pi r)/(Pi-2 ArcTan[(2 r Sqrt[a^2+b^2+4 r^2])/(a b)])]
where
r = Radius
a,b = Rectangle sides
Let's suppose we want rectangles with one side half of the other (b = a/2) and a sphere of radius 1.
So, the number of rectangles as a function of a gives:
Where you may see that a rectangle with a "large" side of size 2 allows 10 rectangles in the sphere, while a rectangle of "large" side 4 allows only 4 rectangles.

Problem retrieving pixel color on color picker

I'm currently making a color picker (pretty standard one, pretty much the same as photoshop with less options at the moment: still in early stage). Here's the picture of the actual thing : http://i.stack.imgur.com/oEvJW.jpg
The problem is : to retrieve the color of the pixel that is under the color selector (the small one, the other is the mouse), I have this line that I thought would do it :
_currentColor = Convert.hsbToHex(new HSB(0,
((_colorSelector.x + _colorSelector.width/2)*100)/_largeur,
((_colorSelector.y + _colorSelector.height/2)*100)/_hauteur
));
Just to clarify the code, I simply use the coordinates of the selector in order to create a new HSB Color (saturation is represented on the X axis and brightness (value) on the Y axis of such a color picker). I then convert this HSB Color to Hexadecimal and assign it to a property. The hue is always set to 0 at the moment but this is irrelevant as I only work with pure red to test.
It partially does what I wanted, but the returned color values are inversed for most of the corners:
for (0,0) it's supposed to return 0xFFFFFF, but it returns 0x000000 instead
for (256, 0) it's supposed to return 0xFF0000, but it returns 0x000000 instead
for (0, 256) it's supposed to return 0x000000, but it returns 0xFFFFFF instead
for (256, 256) it's supposed to return 0x000000, but it returns 0xFF0000 instead
I tried many variations in my code, but I just can't seem to fix it properly. Any reply/suggestions are more than welcomed!

I think the error (or one of them) is using values in the range 0..256 which seems to lead to overflows, try to use 0..255 instead.

Just swap the X and Y axis and it's solved.

Assuming the registration point is centered, which seems to be the case since you're doing:
(_colorSelector.x + _colorSelector.width/2)
I think you formula should look something like this:
(_colorSelector.x + _colorSelector.width/2) / _colorSelector.width
If your registration point is at (0,0), it should be just:
(_colorSelector.x / _colorSelector.width);
The above should give you a number in the range 0...1
Also, you should invert this value for brightness (because a low y value represents a high brightness and a high y value, low brightness; so brightness decreases along the y axis, while saturation increases along the x axis). So for your y axis you should do:
1 - ((_colorSelector.y + _colorSelector.height/2) / _colorSelector.height)
(Again, assuming the registration point is centered).
If your conversion function expects percentages, then you should multiply by 100
(_colorSelector.x + _colorSelector.width/2) / _colorSelector.width * 100
(1 - ((_colorSelector.y + _colorSelector.height/2) / _colorSelector.height)) * 100
Maybe I'm missing something, though. I'm not sure where _largeur and _hauteur come from, but it looks like these are width and height. I think you should use the _colorSelector height and width, but I could be wrong.
PS: I hope you get the idea, because I haven't compiled the above code and maybe I screwed up some parenthesis or made some other dumb mistake.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008