Cox regression, how to get model prediction for each time point - regression

I am new to survival analysis and I need a bit of your help. I made a dummy cox proportional hazard model on the sample data as follows:
library(survival) #survival analysis
lung<-lung
lung$time<-cut(lung$time,breaks = c(-Inf,seq(min(lung$time),max(lung$time), by=100),Inf))
levels(lung$time)<-c(1,2,3,4,5,6,7,8,9,10,11,12)
lung$time<-as.numeric(lung$time)
lung$status<-ifelse(lung$status==1,1,0)
lung$id<-c(1:nrow(lung))
lung<-lung[,c(11,2:5)]
lungTrain<-lung[1:180,]
lungTest<-lung[180:228,]
model <- coxph(Surv(time,status)~age+sex,data = lungTrain)
lungTest$pd<-predict(model,newdata = lungTest,type="expected")
And it seems that the model produces somewhat like a lifetime probability of event. For example In lungTest dataset the data point with id=180 is having predicted 0.0596 probability to survive until time point 4. My questions is how could I produce probability of event for each time point up until 4. Something like at time 1 prob is 0.011,at time 2 prob is 0.021,at time 3 prob is 0.031, and finally at time 4 prob is 0.0596. Thank you in advance!

Related

Use LSTM to forecast Precipitation

I build a LSTM to forecast Precipitation, but it doesn't work well.
My code is very simple and data is very short only contains 720 points.
i use MinMaxScale to scale the data.
this is my code, seq_len = 12
model = Sequential([
layers.LSTM(2, input_shape=(SEQ_LEN, 1),
layers.Dense(1)])
my data is like this
and the output compares with true value like this
I use adam and mae loss function, epoch=10
is it underfitting? or is this simple net can't do this work?
r2_score is no more than 0.55
please tell me how to adjust it. thanks
there are so many options;
first of all it would be better to define the optimized window size by changing the periods of the sequences
The second option would be changing the batch-size of the dataset
Change optimizer into SGD cause of few datapoints and before training model define the best values for learning rate by setting Learning Rate Schedule callback
Try another model architecture with convolution layers and etc
Sometimes it would be a trick to help model performance by setting lambda layer after the last layer to scale up values cause of lstm default activation function is tanh.

How can the CNN model be trained with K-Fold Cross Validation

First of all, thank you in advance for the answers to come. I am confused about the use of K-Fold Cross Validation (CV) with CNN.
When working with CV under normal conditions, as seen in the link below, the original dataset is first split as test and training.
https://miro.medium.com/max/875/1*pJ5jQHPfHDyuJa4-7LR11Q.png
Then, the training dataset is divided into training and validation in the K cycle according to the determined K value. In short, if we say K = 5, our training is repeated 5 times, and each time a newly trained model is formed.
Question 1: How can we calculate the overall training-validation accuracy and loss values of 5 different models. Do we need to add up and average the success of all models?
Question 2: We separated the TEST data set from the original data set at the beginning of the training. How can we test the TEST data set on 5 different models? Should we test on 5 models then get their average accuracy or should we test only on the most successful model?

QAT output nodes for Quantized Model got the same min max range

Recently, I have worked on quantization aware training on tf1.x to push the model to Coral Dev Board. However, when I finished training the model, why is my min max of my 2 outputs fake quantization is the same?
Should it be different when one's maximum target is 95 and one is 2pi?
I have figured out the problem. It is the problem when that part of the model is not really trained QAT. This happens for the output node that somehow forgets to QAT when training. The -6 and 6 values come from the default source of the quantization of tf1.x as mention here
To overcome the problem, we should provide some op to trigger the QAT for the output nodes. In my regression case, I add a dummy op: tf.maximum(output,0) in the model to make the node QAT. If your output is strictly between 0-1, applying "sigmoid" activation at output instead of relu can also solve the problems.

Epochs and Iterations in Deeplearning4j

I recently started learning Deeplearning4j and I fail to understand how the concept of epochs and iterations is actually implemented.
In the online documentation it says:
an epoch is a complete pass through a given dataset ...
Not to be confused with an iteration, which is simply one
update of the neural net model’s parameters.
I ran a training using a MultipleEpochsIterator, but for the first run I set 1 epoch, miniBatchSize = 1 and a dataset of 1000 samples, so I expected the training to finish after 1 epoch and 1000 iterations, but after more than 100.000 iterations it was still running.
int nEpochs = 1;
int miniBatchSize = 1;
MyDataSetFetcher fetcher = new MyDataSetFetcher(xDataDir, tDataDir, xSamples, tSamples);
//The same batch size set here was set in the model
BaseDatasetIterator baseIterator = new BaseDatasetIterator(miniBatchSize, sampleSize, fetcher);
MultipleEpochsIterator iterator = new MultipleEpochsIterator(nEpochs, baseIterator);
model.fit(iterator)
Then I did more tests changing the batch size, but that didn't change the frequency of the log lines printed by the IterationListener. I mean that I thought that if I increase the batch size to 100 then with 1000 samples I would have just 10 updates of the parameters an therefore just 10 iterations, but the logs and the timestamp intervals are more or less the same.
BTW. There is a similar question, but the answer does not actually answer my question, I would like to understand better the actual details:
Deeplearning4j: Iterations, Epochs, and ScoreIterationListener
None of this will matter after 1.x (which is already out in alpha) - we got rid of iterations long ago.
Originally it was meant to be shortcut syntax so folks wouldn't have to write for loops.
Just focus on for loops with epochs now.

How can we define an RNN - LSTM neural network with multiple output for the input at time "t"?

I am trying to construct a RNN to predict the possibility of a player playing the match along with the runs score and wickets taken by the player.I would use a LSTM so that performance in current match would influence player's future selection.
Architecture summary:
Input features: Match details - Venue, teams involved, team batting first
Input samples: Player roster of both teams.
Output:
Discrete: Binary: Did the player play.
Discrete: Wickets taken.
Continous: Runs scored.
Continous: Balls bowled.
Question:
Most often RNN uses "Softmax" or"MSE" in the final layers to process "a" from LSTM -providing only a single variable "Y" as output. But here there are four dependant variables( 2 Discrete and 2 Continuous). Is it possible to stitch together all four as output variables?
If yes, how do we handle mix of continuous and discrete outputs with loss function?
(Though the output from LSTM "a" has multiple features and carries the information to the next time-slot, we need multiple features at output for training based on the ground-truth)
You just do it. Without more detail on the software (if any) in use it is hard to give more detasmail
The output of the LSTM unit is at every times step on of the hidden layers of your network
You can then input it in to 4 output layers.
1 sigmoid
2 i'ld messarfound wuth this abit. Maybe 4x sigmoid(4 wickets to an innnings right?) Or relu4
3,4 linear (squarijng it is as lso an option,e or relu)
For training purposes your loss function is the sum of your 4 individual losses.
Since f they were all MSE you could concatenat your 4 outputs before calculating the loss.
But sincd the first is cross-entropy (for a decision sigmoid) yould calculate seperately and sum.
You can still concatenate them after to have a output vector