Number of parameters and FLOPS in ONNX and TensorRT model - deep-learning

Does number of parameters and FLOPS (float operations per second) change when convert a model from PyTorch to ONNX or TensorRT format?

I don't think Anvar's post answered OP's question thoroughly so I did a little bit of research. Some general info before the answers to the questions as I believe OP hasn't understood fully what TensorRT and ONNX optimizations happen during the conversion from PyTorch format.
Both conversions, Pytorch to ONNX and ONNX to TensorRT increase the performance of the model by using several different optimizations. The tools actually print you information about what they do if you choose the verbose flag for them.
The preferred way to convert a Pytorch model to TensorRT is to use Torch-TensorRT as explained here.
TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU.
ONNX runtime offers mostly graph optimizations such as graph simplifications and node fusions to improve performance.
1. Does the number of parameters change when converting a PyTorch model to ONNX or TensorRT?
No: even though the layers are fused the number of parameters does not decrease unless there are some redundant branches in the model.
I tested this by downloading the yolov5s.onnx model here. The original model has 7.2M parameters according to the repository authors. Then I used this tool to count the number of parameters in the yolov5.onnx model and got 7225917 as a result. Thus, onnx conversion did not reduce the amount of parameters.
I was not able to get as elaborate information for TensorRT model but you can get layer information using trtexec. There is a recent question about this but there are no answers yet.
2. Does the number of FLOPS change when converting a PyTorch model to ONNX or TensorRT?
According to this post, no.

I know that since some of new versions of Pytorch (I used 1.8 and it worked for me) there are some fusions of batch norm layers and convolutions while saving model. I'm not sure about ONNX, but TensorRT actively uses horizontal and vertical fusion of different layers, so final model would be computational cheaper, than model that you initialized.

Related

How to improve model latency for deployment

Question:
How to improve model latency for web deployment without retraining the models? What is the checklist that I should mark to improve the model speed?
Context:
I have multiple models that process a video sequentially on one machine with one K80 GPU; each model takes around 5 mins to process a video that is 1 min long. What ideas and suggestions should I try to improve each model latency without changing the model architecture? How should I structure my thinking about this problem?
Sampling frames is the easiest technique if it fits your usecase. Picking every 5th frame for inference will cut your inference time by ~5x(theoretically). Caveat is if you are working on tasks like object tracking you will have reduced accuracy.
fp32 to fp16 might increase your inference speed.
Batch Inference always lowers inference time by a decent bit. Ref: https://github.com/ultralytics/yolov5/issues/1806#issuecomment-752834571
Multiprocess Concurrent Inference is basically spinning up more than 1 instances of the same model on seperate processes and infer parallely. torch has a multiprocessing module torch.multiprocessing. I havent ever used this but i assume the setup would be somewhat significant and complex.
Nvidia Tesla K80 is quite an old GPU (2014), so that's probably the reason why the processing time is so long. If your machine has a modern Intel CPU and/or iGPU you could try OpenVINO. It's a heavily optimized toolkit for inference. Here are some performance benchmarks.
You can find a full tutorial on how to convert the PyTorch model here.
Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[pytorch,onnx]
Save your model to ONNX
OpenVINO cannot convert PyTorch model directly for now but it can do it with ONNX model. This sample code assumes the model is for computer vision.
dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=11)
Use Model Optimizer to convert ONNX model
The Model Optimizer is a command line tool which comes from OpenVINO Development Package so be sure you have installed it. It converts the ONNX model to IR, which is a default format for OpenVINO. It also changes the precision to FP16 for even better performance (there shouldn't be much accuracy drop). Run in command line:
mo --input_model "model.onnx" --input_shape "[1,3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir "model_ir"
Run the inference on the CPU
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU. Use AUTO if you want to deploy on the best device you have.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

How to merge two ONNX deep learning models

I have two models that are in ONNX format. Both models are similar (both are pre-trained deep learning models, ex. ResNet50 models). The only difference between them is that the last layers are optimized/retrained for different data sets.
I want to merge the first k layers of these two models, as shown below. This should enhance the performance of inference.
To make my case clearer these are examples from other machine learning tools to implement this feature Ex. Pytorch, Keras.
You can use the ONNX package and the APIs it exposes (https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md) to mutate models/graphs.
The sclblonnx package provides a number of higher level functions to edit ONNX graphs, including the ability to merge two subgraphs.
sclblonnx package does not support dynamic size inputs.
For dynamic size inputs, one solution would be writing your own code using ONNX API as stated earlier.Another solution would be converting the two ONNX models to a framework(Tensorflow or PyTorch) using tools like onnx-tensorflow or onnx2pytorch. You could manipulate the networks in the Tensorflow or Pytorch and export the whole network to Onnx format.

Dataset equivalent in Julia Flux

I want to use Flux to train a Deep Learning model on audio files. In Flux documentation, they passed the whole data array (with all examples) to a dataloader that would feed the train!() function with a list of batches. The point is that I have not enough memory in my system to load all audio files at once.
In PyTorch, the dataloader would be fed by a dataset object that would have the logic to open one file at a time on the __getitem__() method.
So, what is the right way to implement it in Flux/Julia, what is the Torch dataset equivalent?
I found this thread on Julia discourse forum that covers basically what I am asking in this question.
https://discourse.julialang.org/t/pytorch-dataloader-equivalent-for-training-large-models-with-flux/30763
From some recommendations o the topic, there is this one, the package MLDataUtils.jl, that offer similar functionality with the nobs() and getobs() functions.

Deep Neural Network Weight's Evaluation

I am using Theano with keras. I have a trained DNN and I have dumped the weight's in a file. I am performing some operations on these weights and again dumping the new converted weights into another file.
Now, I am loading my DNN model with these converted weights and want to compare the results between the two.
I used the keras.evaluate method but I find the accuracy to be exactly same even though the weights are different.
Is there another approach with which I can compare the accuracy?
Thanks.
Keras performs some under the hood operations for your batch_size including normalization. So if you only scaled and translated your image the result will stay the same.
Anyways you can do model.predict(sample, 1) and write your own evaluation metric to circumvent this issue.

Caffe Autoencoder

I wanna compare the performance of CNN and autoencoder in caffe. I'm completely familiar with cnn in caffe but I wanna is the autoencoder also has deploy.prototxt file ? is there any differences in using this two models rather than the architecture?
Yes it also has a deploy.prototxt.
both train_val.prototxt and 'deploy.prototxt' are cnn architecture description files. The sole difference between them is, train_val.prototxt takes training data and loss as input/output, but 'deploy.prototxt' takes testing image as input, and predicted value as out put.
Here is an example of a cnn and autoencoder for MINST: Caffe Examples. (I have not tried the examples.) Using the models is generally the same. Learning rates etc. depend on the model.
You need to implement an auto-encoder example using python or matlab. The example in Caffe is not true auto-encoder because it doesn't set layer-wise training stage and during training stage, it doesn't fix W{L->L+1} = W{L+1->L+2}^T. It is easily to find a 1D auto-encoder in github, but 2D auto-encoder may be hard to find.
The main difference between the Auto encoders and conventional network is
In Auto encoder your input is your label image for training.
Auto encoder tries to approximate the output similar as input.
Auto encoders does not have softmax layer while training.
It can be used as a pre-trained model for your network which converge faster comparing to other pre-trained models. It is because your network has already extracted the features for your data.
The Conventional training and testing you can perform on pre trained auto encoder network for faster convergence and accuracy.