I am trying to create a model for the following problem
id input (diagnoses) elapsed_days output (medication)
1 [2,3,4] 0 [3,4]
1 [4,5,6] 7 [1]
1 [2,3] 56 [6,3]
2 [6,5,9,10] 0 [5,3,1]
Rather than a single label for the different codes over time, there are labels at each time period.
I am think that my arch would be [input] -> [embedding for diagnoses] -> [append normalized elapsed days to embeddings]
-> [LSTM] -> [FFNs] -> [labels over time]
I am familiar with how to set this up if there were a single label per id. Given there are labels for each row (i.e. multiple per id), should I be passing the hidden layers of the LSTM through the FFN and then assigning the labels? I would really appreciate if somebody could point me to a reference/blog/github/anything for this kind of problem or suggest an alternative approach here.
Assuming the [6,3] is equal to [3, 6].
You can use Sigmoid activation with Binary Cross-Entropy loss function (nn.BCELoss class) instead of Softmax Cross-Entropy (nn.CrossEntropyLoss class).
But the output ground truth instead of integers like when using nn.CrossEntropyLoss. You need to make them sort of one hot encoding instead. For example, if the desired output is [6, 3] and the output has 10 nodes. The y_true has to be [0, 0, 0, 1, 0, 0, 1, 0, 0, 0].
Depending on how you implement your data generator, this is one way to do it.
output = [3, 6]
out_tensor = torch.zeros(10)
out_tensor[output] = 1
But if [6,3] is not equal to [3, 6]. Then more information about this is needed.
It is very common tu use softmax function for converting an array of values in an array of probabilities. In general, the function amplifies the probability of the greater values of the array.
However, this function is not scale invariant. Let us consider an example:
If we take an input of [1, 2, 3, 4, 1, 2, 3], the softmax of that is [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of its weight where the '4' was in the original input. That is, softmax highlights the largest values and suppress values which are significantly below the maximum value. However, if the input were [0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3] (which sums to 1.6) the softmax would be [0.125, 0.138, 0.153, 0.169, 0.125, 0.138, 0.153]. This shows that for values between 0 and 1 softmax, in fact, de-emphasizes the maximum value (note that 0.169 is not only less than 0.475, it is also less than the initial proportion of 0.4/1.6=0.25).
I would need a function that amplifies differences between values in an array, emphasizing the greatest values and that is not so affected by the scale of the numbers in the array.
Can you suggest some function with these properties?
As Robert suggested in the comment, you can use temperature. Here is a toy realization in Python using numpy:
import numpy as np
def softmax(preds):
exp_preds = np.exp(preds)
sum_preds = np.sum(exp_preds)
return exp_preds / sum_preds
def softmax_with_temperature(preds, temperature=0.5):
preds = np.log(preds) / temperature
preds = np.exp(preds)
sum_preds = np.sum(preds)
return preds / sum_preds
def check_softmax_scalability():
base_preds = [1, 2, 3, 4, 1, 2, 3]
base_preds = np.asarray(base_preds).astype("float64")
for i in range(1,3):
print('logits: ', base_preds*i,
'\nsoftmax: ', softmax(base_preds*i),
'\nwith temperature: ', softmax_with_temperature(base_preds*i))
Calling check_softmax_scalability() would return:
logits: [1. 2. 3. 4. 1. 2. 3.]
softmax: [0.02364054 0.06426166 0.1746813 0.474833 0.02364054 0.06426166
0.1746813 ]
with temperature: [0.02272727 0.09090909 0.20454545 0.36363636 0.02272727 0.09090909
0.20454545]
logits: [2. 4. 6. 8. 2. 4. 6.]
softmax: [0.00188892 0.01395733 0.10313151 0.76204449 0.00188892 0.01395733
0.10313151]
with temperature: [0.02272727 0.09090909 0.20454545 0.36363636 0.02272727 0.09090909
0.20454545]
But the scale invariance comes with a cost: as you increase temperature, the output values will come closer to each other. Increase it too much, and you will have an output that looks like a uniform distribution. In your case, you should pick a low value for temperature to emphasize the maximum value.
You can read more about how temperature works here.
In Sublime, I'm trying to take the transpose of a row vector in an Octave file as such:
y = [4, 5, 6];
y_transpose = y';
But whenever I try to run this in Octave, it acts as if introduction of the transpose operator (the ') is the beginning of a string, and ignores the following lines of code. How can I remedy this?
I don't know why it isn't working. ' is actually listed as an operator in the docs. But as a workaround, you could use the transpose function.
y = [4, 5, 6];
y_transpose = transpose(y);
Though I should note that ' is the complex conjugate transpose. Normal transpose is .'. So maybe you should try:
y = [4, 5, 6];
y_transpose = y.';
I searched for this question, but found answers that weren't specific enough.
I'm cleaning up old code and I'm trying to make sure that the following is relatively clean, and hoping that it won't bite me on the rear later on.
My question is about passing a function through a function. Look at the "y" part of the following plot statement. The goo(df)[[1]](x) thing works, but am I asking for trouble in any way? If so, is there a cleaner way?
Also, if the goo() function is called many many times, for instance in a Monte Carlo analysis, will this load up R's internals or possibly cause some type of environment issues?
Edit (02/21/2011) --- The following code is just an example. The real function "goo" has a lot of code before it gets to the approxfun() statement.
#Build a dataframe
df <- data.frame(a=c(1, 2, 3, 4, 5), b=c(4, 3, 1, 2, 6))
#Build a function that passes a function
goo <- function(inp.df) {
out.fun <- approxfun(x=inp.df$a, y=inp.df$b, yright=max(inp.df$b), method="linear", f=1)
list(out.fun, inp.df$a[5], inp.df$b[5])
}
#Set up the plot range
x <- seq(1, 4.3, 0.01)
#Plot the function
plot(x, goo(df)[[1]](x), type="l", xlim=c(0, goo(df)[[2]]), ylim=c(0, goo(df)[[3]]), lwd=2, col="red")
grid()
goo(df)
[[1]]
function (v)
.C("R_approxfun", as.double(x), as.double(y), as.integer(n),
xout = as.double(v), as.integer(length(v)), as.integer(method),
as.double(yleft), as.double(yright), as.double(f), NAOK = TRUE,
PACKAGE = "stats")$xout
<environment: 0219d56c>
[[2]]
[1] 5
[[3]]
[1] 6
It's hard to give you specific recommendations without knowing exactly what your code is, but here are a few things to consider:
Is it really necessary to include pieces of goo's input data in its return value? In other words, can you make goo a straightforward factory that just returns a function? In your example, at least, the plot function already has all the data it needs to determine the limits.
If this is not possible, then stay with this pattern, but give the elements of goo's return value descriptive names so that at least it's easy to see what's going on when you reference them. (E.g., goo(df)$approx(x).) If this structure is used widely in your code, consider making it an S3 class.
Finally, don't invoke goo(df) multiple times in the plot function, just to get different elements out. When you do that, you literally call goo every time, which as you said will execute a lot of code. Also, each invocation will have its own environment with a copy of the input data (although R will be smart enough to reduce the copying to a certain extent and use the same physical instance of df.) Instead, call goo once, assign its value to a variable, and reference that variable subsequently.
I would remove a level of function handling and keep the input data out of the function generation. Then you can keep your function out of the goo and call approxfun only once.
It also generalizes to an input dataframe of any size, not just one with 5 rows.
#Build a dataframe
df <- data.frame(a=c(1, 2, 3, 4, 5), b=c(4, 3, 1, 2, 6))
#Build a function
fun <- approxfun(x = df$a, y = df$b, yright=max(df$b), method="linear", f = 1)
#Set up the plot range
x <- seq(1, 4.3, 0.01)
#Plot the function
plot(x, fun(x), type="l", xlim=c(0, max(df$a)), ylim=c(0, max(df$b)), lwd=2, col="red")
That might not be quite what you need ultimately, but it does remove a level of complexity and gives a cleaner starting point.
This might not be better in a big Monte Carlo simulation, but for simpler situations, it might be clearer to include the x and y ranges as attributes of the output from the created function instead of in a list with the created function. This way goo is a more straightforward factory, like Davor mentions. You could also make the result from your function an object (here using S3) so that it can be plotted more simply.
goo <- function(inp.df) {
out.fun <- approxfun(x=inp.df$a, y=inp.df$b, yright=max(inp.df$b),
method="linear", f=1)
xmax <- inp.df$a[5]
ymax <- inp.df$b[5]
function(...) {
structure(data.frame(x=x, y = out.fun(...)),
limits=list(x=xmax, y=ymax),
class=c("goo","data.frame"))
}
}
plot.goo <- function(x, xlab="x", ylab="approx",
xlim=c(0, attr(x, "limits")$x),
ylim=c(0, attr(x, "limits")$y),
lwd=2, col="red", ...) {
plot(x$x, x$y, type="l", xlab=xlab, ylab=ylab,
xlim=xlim, ylim=ylim, lwd=lwd, col=col, ...)
}
Then to make the function for a data frame, you'd do:
df <- data.frame(a=c(1, 2, 3, 4, 5), b=c(4, 3, 1, 2, 6))
goodf <- goo(df)
And to use it on a vector, you'd do:
x <- seq(1, 4.3, 0.01)
goodfx <- goodf(x)
plot(goodfx)
I'm looking to write a little comp-geom library, in Ruby.
I'm about to write the code for lines, and was wondering which line equation I should use:
ax + by + c = 0
r + tv (where r and v are vectors)
Thanks.
If using the classical equations is not a requirement, I'd suggest an array of four co-ordinates: xStart, yStart, xEnd and yEnd.
In case you need to make the line position dynamic, you could use an array of two parameters: alpha and radius. The former represents radial rotation relative to the horizontal axis and the latter is the length of line.
Yet another option would be vectors in the form of (X;Y).
Samples in C:
int endpointsLine[4] = {0, 0, 30, 40};
double radialLine[2] = {5.35589, 50};
int vectorLine[2] = {30, 40};
The "endpoints" format is fully compatible with modern line-drawing algorithms, such as Xiaolin Wu's line algorithm and Bresenham's line algorithm but it represents specific screen co-ordinates which is not the case with "radial" and "vector" format.