Predict output is NaN on RL Card Game DQN

Predict output is NaN on RL Card Game DQN - reinforcement-learning

I'm trying to make a reinforcement learning project for a popular portuguese card game. I have the environment working. The code can run on it's own, for n rounds, using random picks.
For a quick explanation, the game is similar to Hearts. It has 4 players in 2 teams. Each players has 10 cards, playing following suit, when possible, until no cards left. Each trick earns points, after 10 tricks the team with more than 60 points wins round.
First, i had some doubts on how to "encode" the cards/deck. I'm passing to the model the cards on table, cards on hand and already played cards. I one-hot encoded the deck, 4 suits, 10 numbers/figures. I also have to include the trump (taken from the begining of the round). So as an example 1 card looks like: [ 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0 ], where the first four numbers are suit, and the last is trump (boolean) and the rest is what card (2, 3, 4, 5, 6, j, q, k, 7, a). I'm passing the deck for each hand, table, played, so 40*3, 15 features each (1800 features when flattened). Possibly this is not the best approach, if someone could advise on this would be great.
With this data, i'm setting 1 player as the AI agent. When it's AI turn, state is cards in hand, cards on table, cards played. For next state i'm passing the trick end, with the same type of data. Reward from that trick is a cumulative value (max 120 points). When i run the code, the loss outputs a value initialy, like the first time, then displays NaN. So predictions are coming out as NaN too (i have 10 outputs, 1 for each card in hand. Also having some doubts on this one because they start with 10 cards, but as the game goes and the cards get played, this number goes to zero.)
Here's the code for the training:
async expReplay() {
console.debug('Training...')
const minibatch = await this.memory.concat().sort(() => .5 - Math.random()).slice(0, this.batchSize)
for (let i = 0; i < minibatch.length - 1; i++) {
let [state, action, reward, next_state, done] = minibatch[i]
state = await tf.concat(state).flatten().reshape([1, this.stateSize])
next_state = await tf.concat(next_state).flatten().reshape([1, this.stateSize])
let target = reward
if (!done) {
let predictNext = await this.model.predict(next_state)
target = reward + this.gamma * predictNext.argMax().dataSync()[0]
}
let target_f = await this.model.predict(state).dataSync()
target_f[action] = target
target_f = await tf.tensor2d(target_f, [1, this.actionSize])
await this.model.fit(state, target_f, {
epochs: 1,
verbose: 1,
callbacks: {
onEpochEnd: (epoch, logs) => {
process.stdout.write(`${logs.loss} ${logs.acc} \r`)
}
}
})
await state.dispose()
await next_state.dispose()
await target_f.dispose()
}
if (this.epsilon > this.epsilonMin) {
this.epsilon *= this.epsilonDecay
}
return 'Training... stop!'
}
This loop, i've used it before, on a DQN also for a Bitcoin trader i was experimenting and it worked fine. So i'm guessing my data is wrong somewhere. I've logged the state and next_state to check for NaN but didn't catch any...
If you need more info please ask!

Related

How do I zero a USB scale using WebHID?

I am using WebHID in Chrome to communicate with a USB-enabled digital scale. I'm able to connect to the scale and subscribe to a stream of weight data as follows:
// Get a reference to the scale.
// 0x0922 is the vendor of my particular scale (Dymo).
let device = await navigator.hid.requestDevice({filters:[{vendorId: 0x0922}]});
// Open a connection to the scale.
await device[0].open();
// Subscribe to scale data inputs at a regular interval.
device[0].addEventListener("inputreport", event => {
const { data, device, reportId } = event;
let buffArray = new Uint8Array(data.buffer);
console.log(buffArray);
});
I now receive regular input in the format Uint8Array(5) [2, 12, 255, 0, 0], where the fourth position is the weight data. If I put something on the scale, it changes to Uint8Array(5) [2, 12, 255, 48, 0] which is 4.8 pounds.
I would like to zero (tare) the scale so that its current, encumbered state becomes the new zero point. After a successful zeroing, I would expect the scale to start returning Uint8Array(5) [2, 12, 255, 0, 0] again. My current best guess at this is:
device[0]
.sendReport(0x02, new Uint8Array([0x02]))
.then(response => { console.log("Sent output report " + response) });
This is based on the following table from the HID Point of Sale Usage Tables:
The first byte is the Report ID, which is 2 as per the table. For the second byte, I want the ZS operation set to 1, thus 00000010, thus also 2. sendReport takes the Report Id as the first parameter, and an array of all following data as the second parameter. When I send this to the device, it isn't rejected, but it doesn't zero the scale, and response is undefined.
How can I zero this scale using WebHID?

So I ended up at a very similar place - trying to programmatically zero a USB scale. Setting the ZS did not seem to do anything. Used Wireshark + Stamps.com app to see how they were doing it and noticed that what was sent was actually the Enforced Zero Return, i.e, 0x02 0x01 (Report Id = 2, EZR). Now have this working.

Why we use mAp score for evaluate object detectors in deep learning?

In this Tensorflow detection model zoo they have mentioned COCO mAp score to different detection architectures. They also has said higher the mAp score higher the accuracy . What is don't understand is how this is calculated ? What is the maximum score it can has ? Why this mAp score is different from data set to data set ?

To understand MAP (Mean Average Precision), I would start with AP (Average Precision) first.
Suppose we are searching for images of a flower and we provide our
image retrieval system a sample picture of a rose (query), we do get
back a bunch of ranked images (from most likely to least likely).
Usually not all of them are correct. So we compute the precision at
every correctly returned image, and then take an average.
Example:
If our returned result is 1, 0, 0, 1, 1, 1 where 1 is an image of a flower,
while 0 not, then the precision at every correct point is:
Precision at each correct image = 1/1, 0, 0, 2/4, 3/5, 4/6
Summation of these precisions = 83/30
Average Precision = (Precision summation)/(total correct images) = 83/120
Side note:
This section provides a detailed explanation behind the calculation of precision at each correct image in case you're still confused by the above fractions.
For illustration purposes, let 1, 0, 0, 1, 1, 1 be stored in an array so results[0] = 1, results[1] = 0 etc.
Let totalCorrectImages = 0, totalImagesSeen = 0, pointPrecision = 0
The formula for pointPrecision is totalCorrectImages / totalImagesSeen
At results[0], totalCorrectImages = 1, totalImagesSeen = 1 hence pointPrecision = 1
Since results[1] != 1, we ignore it but totalImagesSeen = 2 && totalCorrectImages = 1
Since results[2] != 1, totalImagesSeen = 3 && totalCorrectImages = 1
At results[3], totalCorrectImages = 2, totalImagesSeen = 4 hence pointPrecision = 2/4
At results[4], totalCorrectImages = 3, totalImagesSeen = 5 hence pointPrecision = 3/5
At results[5], totalCorrectImages = 4, totalImagesSeen = 6 hence pointPrecision = 4/6
A simple way to interpret is to produce a combination of zeros and
ones which will give the required AP. For example, an AP of 0.5 could
have results like 0, 1, 0, 1, 0, 1, ... where every second image is
correct, while an AP of 0.333 has 0, 0, 1, 0, 0, 1, 0, 0, 1, ... where
every third image is correct.
For an AP of 0.1, every 10th image will be correct, and that is
definitely a bad retrieval system. On the other hand, for an AP above
0.5, we will encounter more correct images than wrong in the top results which is definitely a good sign.
MAP is just an extension of AP. You simply take the averages of all the AP scores for a certain number of queries. The above interpretation of AP scores also holds true for MAP. MAP ranges from 0 to 100, higher is better.
AP formula on Wikipedia
MAP formula on Wikipedia
Credits to this blog
EDIT I:
The same concept is applied when it comes to object detection. In this scenario you would calculate the AP for each class. This is given by the area under the precision-recall curve for a given class. From this point, you find their averages to attain the mAP.
For more details, refer to section 3.4.1 and 4.4 of the 2012 Pascal VOC Dev Kit. The related paper can be found here.

swift FFT chunk audio file to get Amplitude

i need your help. in want to use the FFT on my audio file. i want to cut my audio file in more little buffer array and use my FFT with all sub buffer.
why ?? because i need to know and see (with plot data) how my fréquence have particularity. i want to know, how a noise start in my audio file.
that is my FFT code . i dont know what im doing wrong.
thx for your help
EDITING CODE
func FFT (buffer: AVAudioPCMBuffer){
let frameCount = buffer.frameCapacity
let log2n = UInt(round(log2(Double(frameCount))))
print (" log2n \(log2n)");
let bufferSizePOT = Int(1 << log2n)
print (" bufferSizePot \(bufferSizePOT)");
let inputCount = bufferSizePOT / 2
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](repeating: 0, count: inputCount)
var imagp = [Float](repeating: 0, count: inputCount)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
let windowSize = bufferSizePOT
var transferBuffer = [Float](repeating: 0, count: windowSize)
var window = [Float](repeating: 0, count: windowSize)
vDSP_hann_window(&window, vDSP_Length(windowSize), Int32(vDSP_HANN_NORM))
vDSP_vmul((buffer.floatChannelData?.pointee)!, 1, window,
1, &transferBuffer, 1, vDSP_Length(windowSize))
let temp = UnsafePointer<Float>(transferBuffer)
temp.withMemoryRebound(to: DSPComplex.self, capacity: transferBuffer.count) { (typeConvertedTransferBuffer) -> Void in
vDSP_ctoz(typeConvertedTransferBuffer, 2, &output, 1, vDSP_Length(inputCount))
}
// Do the fast Fourier forward transform, packed input to packed output
vDSP_fft_zrip(fftSetup!, &output, 1, log2n, FFTDirection(FFT_FORWARD))
//---------------------------------------------------
var magnitudes = [Float](repeating: 0.0, count: inputCount)
vDSP_zvmags(&output, 1, &magnitudes, 1, vDSP_Length(inputCount))
var normalizedMagnitudes = [Float](repeating: 0.0, count: inputCount)
vDSP_vsmul(sqrt(x: magnitudes), 1, [2.0 / Float(inputCount)],
&normalizedMagnitudes, 1, vDSP_Length(inputCount))
for f in 0..<normalizedMagnitudes.count
{
print("\(f), \(normalizedMagnitudes[f])")
}
vDSP_destroy_fftsetup(fftSetup)
}

Basically, instead of making
frameCount = UInt32(audioFile.length)
you want a to use a much smaller for frameCount (such as 4096, for roughly 1/10th of a second of audio), and then loop through the audio file, reading and processing each smaller chunk of data of frameCount length (doing shorter windowed FFTs) separately, instead of processing the entire file in one huge FFT.
Note that when looping over the audio file sample data and doing a series of FFTs of the same length, you only need to do the fftSetup once.
If you want, you can vector sum sets of resulting magnitude vectors over a longer time period to reduce the time resolution and length of your printout. This is a form of Welch's method of spectral density estimation.

Finding Median WITHOUT Data Structures

(my code is written in Java but the question is agnostic; I'm just looking for an algorithm idea)
So here's the problem: I made a method that simply finds the median of a data set (given in the form of an array). Here's the implementation:
public static double getMedian(int[] numset) {
ArrayList<Integer> anumset = new ArrayList<Integer>();
for(int num : numset) {
anumset.add(num);
}
anumset.sort(null);
if(anumset.size() % 2 == 0) {
return anumset.get(anumset.size() / 2);
} else {
return (anumset.get(anumset.size() / 2)
+ anumset.get((anumset.size() / 2) + 1)) / 2;
}
}
A teacher in the school that I go to then challenged me to write a method to find the median again, but without using any data structures. This includes anything that can hold more than one value, so that includes Strings, any forms of arrays, etc. I spent a long while trying to even conceive of an idea, and I was stumped. Any ideas?

The usual algorithm for the task is Hoare's Select algorithm. This is pretty much like a quicksort, except that in quicksort you recursively sort both halves after partitioning, but for select you only do a recursive call in the partition that contains the item of interest.
For example, let's consider an input like this in which we're going to find the fourth element:
[ 7, 1, 17, 21, 3, 12, 0, 5 ]
We'll arbitrarily use the first element (7) as our pivot. We initially split it like (with the pivot marked with a *:
[ 1, 3, 0, 5, ] *7, [ 17, 21, 12]
We're looking for the fourth element, and 7 is the fifth element, so we then partition (only) the left side. We'll again use the first element as our pivot, giving (using { and } to mark the part of the input we're now just ignoring).
[ 0 ] 1 [ 3, 5 ] { 7, 17, 21, 12 }
1 has ended up as the second element, so we need to partition the items to its right (3 and 5):
{0, 1} 3 [5] {7, 17, 21, 12}
Using 3 as the pivot element, we end up with nothing to the left, and 5 to the right. 3 is the third element, so we need to look to its right. That's only one element, so that (5) is our median.
By ignoring the unused side, this reduces the complexity from O(n log n) for sorting to only O(N) [though I'm abusing the notation a bit--in this case we're dealing with expected behavior, not worst case, as big-O normally does].
There's also a median of medians algorithm if you want to assure good behavior (at the expense of being somewhat slower on average).
This gives guaranteed O(N) complexity.

Sort the array in place. Take the element in the middle of the array as you're already doing. No additional storage needed.
That'll take n log n time or so in Java. Best possible time is linear (you've got to inspect every element at least once to ensure you get the right answer). For pedagogical purposes, the additional complexity reduction isn't worthwhile.
If you can't modify the array in place, you have to trade significant additional time complexity to avoid avoid using additional storage proportional to half the input's size. (If you're willing to accept approximations, that's not the case.)

Some not very efficient ideas:
For each value in the array, make a pass through the array counting the number of values lower than the current value. If that count is "half" the length of the array, you have the median. O(n^2) (Requires some thought to figure out how to handle duplicates of the median value.)
You can improve the performance somewhat by keeping track of the min and max values so far. For example, if you've already determined that 50 is too high to be the median, then you can skip the counting pass through the array for every value that's greater than or equal to 50. Similarly, if you've already determined that 25 is too low, you can skip the counting pass for every value that's less than or equal to 25.
In C++:
int Median(const std::vector<int> &values) {
assert(!values.empty());
const std::size_t half = values.size() / 2;
int min = *std::min_element(values.begin(), values.end());
int max = *std::max_element(values.begin(), values.end());
for (auto candidate : values) {
if (min <= candidate && candidate <= max) {
const std::size_t count =
std::count_if(values.begin(), values.end(), [&](int x)
{ return x < candidate; });
if (count == half) return candidate;
else if (count > half) max = candidate;
else min = candidate;
}
}
return min + (max - min) / 2;
}
Terrible performance, but it uses no data structures and does not modify the input array.

most readable programming language to simulate 10,000 chutes and ladders game plays?

I'm wondering what language would be most suitable to simulate the game Chutes and Ladders (Snakes and Ladders in some countries). I'm looking to collect basic stats, like average and standard deviation of game length (in turns), probability of winning based on turn order (who plays first, second, etc.), and anything else of interest you can think of. Specifically, I'm looking for the implementation that is most readable, maintainable, and modifiable. It also needs to be very brief.
If you're a grown-up and don't spend much time around young kids then you probably don't remember the game that well. I'll remind you:
There are 100 squares on the board.
Each player takes turn spinning a random number from 1-6 (or throwing a dice).
The player then advances that many squares.
Some squares are at the base of a ladder; landing on one of these squares means the player gets to climb the ladder, advancing the player's position to a predetermined square.
Some squares are at the top of a slide (chute or snake); landing on one of these squares means the player must slide down, moving the player's position back to a predetermined square.
Whichever player gets to position 100 first is the winner.

This is a bit rough, but it should work:
class Board
attr_accessor :winner
def initialize(players, &blk)
#chutes, #ladders = {}, {}
#players = players
#move = 0
#player_locations = Hash.new(0)
self.instance_eval(&blk)
end
def chute(location)
#chutes[location[:from]] = location[:to]
end
def ladder(location)
#ladders[location[:from]] = location[:to]
end
def spin
player = #move % #players
die = rand(6) + 1
location = (#player_locations[player] += die)
if endpoint = #chutes[location] || endpoint = #ladders[location]
#player_locations[player] = endpoint
end
if #player_locations[player] >= 100
#winner = player
end
#move += 1
end
end
num_players = 4
board = Board.new num_players, do
ladder :from => 4, :to => 14
ladder :from => 9, :to => 31
# etc.
chute :from => 16, :to => 6
# etc.
end
until board.winner
board.spin
end
puts "Player #{board.winner} is the winner!"

You should check out something along the lines of Ruby or Python. Both are basically executable psuedocode.
You might be able to get a shorter, more brilliant program with Haskell, but I would imagine Ruby or Python would probably be actually understandable.

For many statistics, you don't need to simulate. Using Markov Chains, you can reduce many problems to matrix operations on a 100x100-matrix, which only take about 1 millisecond to compute.

I'm going to disagree with some of the earlier posters, and say that an object oriented approach is the wrong thing to do here, as it makes things more complicated.
All you need is to track the position of each player, and a vector to represent the board. If the board position is empty of a chute or ladder, it is 0. If it contains a ladder, the board contains a positive number that indicates how many positions to move forward. If it contains a chute, it contains a negative number to move you back. Just track the number of turns and positions of each player.
The actual simulation with this method is quite simple, and you could do it in nearly any programming language. I would suggest R or python, but only because those are the ones I use most these days.
I don't have a copy of chutes and ladders, so I made up a small board. You'll have to put in the right board:
#!/usr/bin/python
import random, numpy
board = [0, 0, 0, 3, 0, -3, 0, 1, 0, 0]
numplayers = 2
numruns = 100
def simgame(numplayers, board):
winner = -1
winpos = len(board)
pos = [0] * numplayers
turns = 0
while max(pos) < winpos:
turns += 1
for i in range(0, numplayers):
pos[i] += random.randint(1,6)
if pos[i] < winpos:
pos[i] += board[pos[i]]
if pos[i] >= winpos and winner == -1:
winner = i
return (turns, winner)
# simulate games, then extract turns and winners
games = [simgame(numplayers, board) for x in range(numruns)]
turns = [n for (n, w) in games]
winner = [w for (t, w) in games]
pwins = [len([p for p in winner if p == i]) for i in range(numplayers)]
print "runs:", numruns
print "mean(turns):", numpy.mean(turns)
print "sd(turns):", numpy.std(turns)
for i in range(numplayers):
print "Player", (i+1), "won with proportion:", (float(pwins[i])/numruns)

F# isn't too ugly as well, its hard to beat a functional language for conciseness:
#light
open System
let snakes_and_ladders = dict [(1,30);(2,5);(20,10);(50,11)]
let roll_dice(sides) =
Random().Next(sides) + 1
let turn(starting_position) =
let new_pos = starting_position + roll_dice(6)
let found, after_snake_or_ladder = snakes_and_ladders.TryGetValue(new_pos)
if found then after_snake_or_ladder else new_pos
let mutable player_positions = [0;0]
while List.max player_positions < 100 do
player_positions <- List.map turn player_positions
if player_positions.Head > 100 then printfn "Player 1 wins" else printf "Player 2 wins"

I remember about 4 years ago being it a Top Coders competition where the question was what is the probability that player 1 win at snakes and ladders.
There was a really good write-up how to do the puzzle posted after the match. Can't find it now but this write-up is quite good
C/C++ seemed enough to solve the problem well.

Pick any object oriented language, they were invented for simulation.
Since you want it to be brief (why?), pick a dynamically typed language such as Smalltalk, Ruby or Python.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Predict output is NaN on RL Card Game DQN - reinforcement-learning

Related

How do I zero a USB scale using WebHID?

Why we use mAp score for evaluate object detectors in deep learning?

swift FFT chunk audio file to get Amplitude

Finding Median WITHOUT Data Structures

most readable programming language to simulate 10,000 chutes and ladders game plays?

Categories

Resources