The result of using the catboostclassifier model's output python file to predict is different from the result of using model to predict directly - catboost

I want to verify that the predicted results from the exported file are consistent with those predicted directly.
I use the output Python file with the model description of catclassifier to predict result:
But the result which is predicted directly is 2.175615211102761. It is verified that this is true for multiple data. I want to know why and how to solve it.
float_sample and cat_sample look like

Supplementary question: the results predicted by using the model file described in Python language provided by the catboost tutorial are different from those predicted directly by the model

Related

Language translation using TorchText (PyTorch)

I have recently started with ML/DL using PyTorch. The following pytorch example explains how we can train a simple model for translating from German to English.
https://pytorch.org/tutorials/beginner/torchtext_translation_tutorial.html
However I am confused on how to use the model for running inference on custom input. From my understanding so far :
1) We will need to save the "vocab" for both German (input) and English(output) [using torch.save()] so that they can be used later for running predictions.
2) At the time of running inference on a German paragraph, we will first need to convert the German text to tensor using the german vocab file.
3) The above tensor will be passed to the model's forward method for translation
4) The model will again return a tensor for the destination language i.e., English in current example.
5) We will use the English vocab saved in first step to convert this tensor back to English text.
Questions:
1) If the above understanding is correct, can the above steps be treated as a generic approach for running inference on any language translation model if we know the source and destination language and have the vocab files for the same? Or can we use the vocab provided by third party libraries like spacy?
2) How do we convert the output tensor returned from model back to target language? I couldn't find any example on how to do that. The above blog explains how to convert the input text to tensor using source-language vocab.
I could easily find various examples and detailed explanation for image/vision models but not much for text.
Yes globally what you are saying is correct, and of course you can any vocab, e.g. provided by spacy. To convert a tensor into natrual text, one of the most used thechniques is to keep both a dict that maps indexes to words and an other dict that maps words to indexes, the code below can do this:
tok2idx = defaultdict(lambda: 0)
idx2tok = {}
for seq in sequences:
for tok in seq:
if not tok in tok2idx:
tok2idx[tok] = index
idx2tok[index] = tok
index += 1
Here sequences is a list of all the sequences (i.e. sentences in your dataset). You can change the model easily if you have only a list of words or tokens, by only keeping the inner loop.

Create LMDB for new test data

I have an LMDB train data file for the VPGNet CNN model pre- trained on Caltech Lane data set.
I would like to test it on new data set different from the training data set. How to create LMDB for the new test data.
Do I need to modify prototxt files for testing with pre-trained net. For testing do I need a prototxt file or there is a specific command.
Thanks
Lightning Memory-Mapped Databases (LMDB) formats can be efficiently process as input data.
We create the native format (lmdb) for training and validating the model.
Once the trained model converged and the loss is calculated on training and validation data,
we use separate data (Unknown data/ Data which is not used for training) for inference the model.
In case if we are running a classification inference on a single image or set of images,
We need not convert those in to lmdb. Instead we can just run a forward pass on the stacked topology with the image/images converted into the desired format (numpy arrays).
For More info :
https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-optimized-for-intel-architecture

How to pre-process category info in deeplearning (keras) training input data?

I have category information in the training input data, I am wondering what's the best way to normalize it.
The category information is like "city", "gender" and etc.
I'd like to use Keras to handle the process.
Scikitlearn has a preprocessing library with functions to normalize or scale your data.
This video gives an example for how to preprocess data that will be used for training a model with Keras. The preprocessing here is done with the library mentioned above.
As shown in the video, with the use of Scikitlearn's MinMaxScaler class, you can specify a range that you want your data to be transformed into, and then fit your data to that range using the MinMaxScaler.fit_transform() function.

dump weights of cnn in json using keras

I want to use the dumped weights and model architecture in other framework for testing.
I know that:
model.get_config() can give the configuration of the model
model.to_json returns a representation of the model as a JSON string, but that the representation does not include the weights, only the architecture
model.save_weights(filepath) saves the weights of the model as a HDF5 file
I want to save the architecture as well as weights in a json file.
Keras does not have any built-in way to export the weights to JSON.
Solution 1:
For now you can easily do it by iterating over the weights and saving it to the JSON file.
weights_list = model.get_weights()
will return a list of all weight tensors in the model, as Numpy arrays.
Then, all you have to do next is to iterate over this list and write to the file:
for i, weights in enumerate(weights_list):
writeJSON(weights)
Solution 2:
import json
weights_list = model.get_weights()
print json.dumps(weights_list.tolist())

Should the structure of a derived obj file coinside with the naming of the original step file?

When using the Model Derivative API I successfully generate an obj representation from a step file. But within that process are some quirks that I do not fully understand:
The Post job has a output.advanced.exportFileStructure property which can be set to "multiple" and a output.advanced.objectIds property which lets you specify the which parts of the model you would like to extract. From the little that the documentation states, I would expect to receive one obj file per requested objectid. Which from my experience is not the case. So does this only work for compressed files like .iam and .ipt?
Well, anyway, instead I get one obj file for all objectIds with one polygon group per objectId. The groups are named (duh!), so I would expect them to be named like their objectId but it seams like the numbers are assigned in a random way. So how should I actually map an objectId to its corresponding 3d part? Is there any way to link the information from GET :urn/metadata/:guid/properties back to their objects?
I hope somebody can shine light on this. If you need more information I can provide you with the original step file, the obj and my server log.
You misunderstood the objectIds property of the derivatives API: specifying that field allows you to export only specific components to a single obj, for example your car model has 1000 different components, but you just want to export components that represent the engine: [34, 56, 76] (I just made those up...). If you want to export each objectId to a separate obj file, you need to fire multiple jobs. the "exportFileStructure" option only applies to composite designs (i.e. assemblies) single: creates one OBJ file for all the input files (assembly file), multiple: creates a separate OBJ file for each object. A step file is not a composite design.
As you noticed the obj groups are named randomly. As far as I know there is no easy reliable way to map a component in the obj file to the original objectId because .obj is a very basic format and it doesn't support metadata. You could use a geometric approach (finding where is the component in space, use bounding boxes, ...) to achieve the mapping but it could be challenging with complex models.