How do you infer topics on a supervised LDA/LLDA in mallet? - lda

I used MALLET's LabeledLDA class to make a model, that I have saved in a binary file. I want to take my test data and see how well the model predicts the appropriate label.
I can only find documentation for the unsupervised LDA here under the infer topics section.
this is the command I'm using:
mallet infer-topics --inferencer fold_0_train.bin --input fold_0_test.txt
I'm getting a Java end of file exception.
How do you test the LLDA with test data?
What format should the test data be in?

Related

BentoML - Seving a CatBoostClassifier with cat_features

I am trying to create a BentoML service for a CatBoostClassifier model that was trained using a column as a categorical feature. If i save the model and I try to make some predictions with the saved model (not as a BentoML service) all works as expected, but when I create the service using BentML I get an error
_catboost.CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=2]="Tertiary": Cannot convert 'b'Tertiary'' to float
The value is found in a column named 'road_type' and the model was trained using 'object' as the data type for the column.
If I try to give a float or an integer for the 'road_type' column I get the following error
_catboost.CatBoostError: catboost/libs/data/model_dataset_compatibility.cpp:53: Feature road_type is Categorical in model but marked different in the dataset
If someone has encountered the same issue and found a solution I would appreciate it. Thanks!
I have tried different approaches for saving the model or loading the model but unfortunately it did not worked.
You can try to explicitly pass the cat_features to the bentoml runner.
It would be something like this:
from catboost import Pool
runner = bentoml.catboost.get("bentoml_catboost_model:latest").to_runner()
cat_features = [2] # specify your cat_features indexes
prediction = runner.predict.run(Pool(input_data, cat_features=cat_features))

How do I read and write to the same dataset in Code Repository?

How to read and write the same dataset in a transform? I have an input dataset (input_ds1) and another input dataset (input_ds2). When I output to one of these dataset's paths (ex.dataset2 in code below) the check fails, with a cyclical dependency error.
Below I attacked an example:
#transform(
input_ds1=Input('Other Namespace/Other/Foundry_support_test/dataset1'),
input_ds2=Input('/Other Namespace/Other/Foundry_support_test/dataset2'),
output=Output('/Other Namespace/Other/Foundry_support_test/dataset2'),
)
def compute(input_ds1, input_ds2, output):
This is possible to read and write to the content of the output dataset with the #incremental() decorator. With it you can read the previous version of any dataset and avoid the cyclical dependency error.
#transform(
input_ds1=Input('Other Namespace/Other/Foundry_support_test/dataset1'),
output=Output('/Other Namespace/Other/Foundry_support_test/dataset2'),
)
def compute(input_ds1, input_ds2, output):
input_ds2 = output.dataframe('previous')
Incremental transform is designed for other use cases but contains a lot of features. More details in the incremental documentation: https://www.palantir.com/docs/foundry/transforms-python/incremental-reference/

Predicting a single point of data from a Stata non-linear model

I have a hazard model on a Weibull distribution, using the Stata command streg with the options nohr and time appended to that line of code. At least, that's the code from the do file I downloaded from a replication file.
If I have a new sliver of data, how do I compute the value of the model for that specific sliver of data? I would solve by hand in Excel (my wheelhouse is R or Python) but the closed form of the regression eludes me. I'm not sure from the documentation on the command exactly how they're adding in the other regressors and the Weibull regression has a lot of parameters that I'd rather not manually chug at. I'm hoping someone can help with what I believe is a simple out-of-sample forecast in a language I simply do not use.
infile warnum frstyear lastyear ccode1 ccode2 length logleng censor oadm oada oadp omdm omda omdp opdm opda opdp durscale rterrain rterrstr summperb sumpopbg popratbg bofadjbg qualratb salscale reprsumb demosumb surpdiff nactors adis3010 using 1perwarf.raw
stset length, id(warnum) fail(censor)
streg oadm oada oadp opda rterrain rterrstr bofadjbg summperb sumpopbg popratbg qualratb surpdiff salscale reprsumb demosumb adis3010 nactors, dist(weibull) nohr time

How to test DeepLabV3+ on on test set?

The model zoo provided few pre-trained models with several datasets such like PASCAL VOC2012, Cityscapes, ...etc. I am trying to run it on my local and it works as well with validation set because of they are providing the code to convert train/validation set to tfrecord. However, I couldn't test DeepLabV3+ with test set.
Is there any way to run with test set?
Use [inference] and loop over all images.1

Caffe: Print the softmax score

In the given example of MNIST in the Caffe installation.
For any given test image, how to get the softmax scores for each category and do some processing on them? Say compute the mean and variance of them.
I am newbie so a detail would help me a lot. I am able to train the model and use the testing feature to get the prediction but I am not sure which files are to be edited in order to get the above results.
You can use python interface
import caffe
net = caffe.Net('/path/to/deploy.prototxt', '/path/to/weights.caffemodel', caffe.TEST)
in_ = read_data(...) # this is up to you to read a sample and convert it to numpy array
out_ = net.forward(data=in_) # assuming your net expects "data" in blob
Now you have the output of your net in a dictionary out (keys are names of output blobs). You can run it in a loop on several examples etc.
I can try to answer your question. Assuming in your deploying net, the softmax layer is like below:
layer {
name: "prob"
type : "Softmax"
bottom: "fc6"
top: "prob"
}
In your python code that processes data, combining with the code #Shai provided, you can get the probability of each category by adding code based on #Shai's code:
predicted_prob = net.blobs['prob'].data
predicted_prob will be returned an array that contains the probabilities with all categories.
For example, if you only have two categories, predicted_prob[0][0] will be the probability that this testing data belongs to one category and predicted_prob[0][1] will be the probability of the other one.
PS:
If you don't want to write any additional python script, according to https://github.com/BVLC/caffe/tree/master/examples/mnist
it says this example will automatically do the testing every 500 iterations. "500" is defined in solver, such as https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt
So you need to trace back the caffe source code that processes the solver file. I guess it should be https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp
I am not sure solver.cpp is the correct file you need to look at. But in this file, you can see it has functions of testing and calculation of some values. I hope it can give you some ideas if no one else can answer your question.