My end goal is to use the TensorRT to optimize my model for deployment. I am doing my experiments in tensorflow. But the TensorRT requires the input model in caffemodel format.
As of TensorRT 2.1, there is no support to convert TensorFlow model. This might be supported in the future.
You might want to explore options to convert TensorFlow model to Caffe and then convert that using TensorRT into an inference engine.
New TensorRT 3 added support for TensorFlow.
Related
Does number of parameters and FLOPS (float operations per second) change when convert a model from PyTorch to ONNX or TensorRT format?
I don't think Anvar's post answered OP's question thoroughly so I did a little bit of research. Some general info before the answers to the questions as I believe OP hasn't understood fully what TensorRT and ONNX optimizations happen during the conversion from PyTorch format.
Both conversions, Pytorch to ONNX and ONNX to TensorRT increase the performance of the model by using several different optimizations. The tools actually print you information about what they do if you choose the verbose flag for them.
The preferred way to convert a Pytorch model to TensorRT is to use Torch-TensorRT as explained here.
TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU.
ONNX runtime offers mostly graph optimizations such as graph simplifications and node fusions to improve performance.
1. Does the number of parameters change when converting a PyTorch model to ONNX or TensorRT?
No: even though the layers are fused the number of parameters does not decrease unless there are some redundant branches in the model.
I tested this by downloading the yolov5s.onnx model here. The original model has 7.2M parameters according to the repository authors. Then I used this tool to count the number of parameters in the yolov5.onnx model and got 7225917 as a result. Thus, onnx conversion did not reduce the amount of parameters.
I was not able to get as elaborate information for TensorRT model but you can get layer information using trtexec. There is a recent question about this but there are no answers yet.
2. Does the number of FLOPS change when converting a PyTorch model to ONNX or TensorRT?
According to this post, no.
I know that since some of new versions of Pytorch (I used 1.8 and it worked for me) there are some fusions of batch norm layers and convolutions while saving model. I'm not sure about ONNX, but TensorRT actively uses horizontal and vertical fusion of different layers, so final model would be computational cheaper, than model that you initialized.
I tried to convert my Pytorch models to TensorFlow Lite with ONNX. But my inference time from TensorFlow Lite is twice as slow as Tensorflow and Pytorch. I run TensorFlow Lite model in google colab and this is my first time using TensorFlow Lite.
Here is my code to convert from Tensorflow to TensorFlow Lite:
converter = tf.lite.TFLiteConverter.from_saved_model("model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
model_lite = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(model_lite)
Any suggestions will help me a lot.
TensorFlow Lite models are supposed to run fast on embedded devices. So you have to use it inside an android phone to find out the time. Colab notebook will not give you the correct time.
You can also use benchmark tool to measure inference time of steady state.
If you would like to run inference on a PC or Google Colab, I'd recommend OpenVINO. OpenVINO is optimized for Intel hardware but it should work with any CPU. It optimizes the inference performance by e.g. graph pruning and fusing some operations. Here are the performance benchmarks for PyTorch models, among others.
You can find a full tutorial on how to convert the PyTorch model here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[pytorch,onnx]
Save your model to ONNX
OpenVINO cannot convert the PyTorch model directly for now but it can do it with the ONNX model. This sample code assumes the model is for computer vision.
dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=11)
Use Model Optimizer to convert ONNX model
The Model Optimizer is a command line tool that comes from OpenVINO Development Package so be sure you have installed it. It converts the ONNX model to OV format (aka IR), which is a default format for OpenVINO. It also changes the precision to FP16 (to further increase performance). Run in a command line:
mo --input_model "model.onnx" --input_shape "[1, 3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir "model_ir"
Run the inference on the CPU
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
It's worth mentioning that Runtime can process the ONNX model directly. In that case, just skip the conversion (Model Optimizer) step and give onnx path to the read_model function.
Disclaimer: I work on OpenVINO.
I have two models that are in ONNX format. Both models are similar (both are pre-trained deep learning models, ex. ResNet50 models). The only difference between them is that the last layers are optimized/retrained for different data sets.
I want to merge the first k layers of these two models, as shown below. This should enhance the performance of inference.
To make my case clearer these are examples from other machine learning tools to implement this feature Ex. Pytorch, Keras.
You can use the ONNX package and the APIs it exposes (https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md) to mutate models/graphs.
The sclblonnx package provides a number of higher level functions to edit ONNX graphs, including the ability to merge two subgraphs.
sclblonnx package does not support dynamic size inputs.
For dynamic size inputs, one solution would be writing your own code using ONNX API as stated earlier.Another solution would be converting the two ONNX models to a framework(Tensorflow or PyTorch) using tools like onnx-tensorflow or onnx2pytorch. You could manipulate the networks in the Tensorflow or Pytorch and export the whole network to Onnx format.
I have met a problem that I train my model in pytorch 0.4.1, but I can't find a tool to convert it into caffe model.
How to use a pytorch 0.4.1 model to init pytorch 0.2.0?
Or how to convert a pytorch 0.4.1 model to a caffe model?
Assuming you are using Caffe2, you can use ONNX to convert models trained in one AI framework to another (with some limitations). In your case you need to export your PyTorch trained model to ONNX model and then import the ONNX model to Caffe2 model.
Follow these tutorials:
PyTorch to ONNX export: link
ONNX to Caffe2 import: link
The tutorials are pretty straightforward, you don't need to write much code. But there are some limitations while exporting your PyTorch model to ONNX.
Check this link for in-depth tutorial on Pytorch to ONNX.
Comment below if you have any doubts/ questions.
EDIT: Try this repo, I have tested it on a toy model it is working.
I wanna compare the performance of CNN and autoencoder in caffe. I'm completely familiar with cnn in caffe but I wanna is the autoencoder also has deploy.prototxt file ? is there any differences in using this two models rather than the architecture?
Yes it also has a deploy.prototxt.
both train_val.prototxt and 'deploy.prototxt' are cnn architecture description files. The sole difference between them is, train_val.prototxt takes training data and loss as input/output, but 'deploy.prototxt' takes testing image as input, and predicted value as out put.
Here is an example of a cnn and autoencoder for MINST: Caffe Examples. (I have not tried the examples.) Using the models is generally the same. Learning rates etc. depend on the model.
You need to implement an auto-encoder example using python or matlab. The example in Caffe is not true auto-encoder because it doesn't set layer-wise training stage and during training stage, it doesn't fix W{L->L+1} = W{L+1->L+2}^T. It is easily to find a 1D auto-encoder in github, but 2D auto-encoder may be hard to find.
The main difference between the Auto encoders and conventional network is
In Auto encoder your input is your label image for training.
Auto encoder tries to approximate the output similar as input.
Auto encoders does not have softmax layer while training.
It can be used as a pre-trained model for your network which converge faster comparing to other pre-trained models. It is because your network has already extracted the features for your data.
The Conventional training and testing you can perform on pre trained auto encoder network for faster convergence and accuracy.