Core ML model including custom layer is slow - caffe

I have changed the coremltools(caffe to Core ML, https://apple.github.io/coremltools/generated/coremltools.converters.caffe.convert.html#coremltools.converters.caffe.convert) and added custom layer into Core ML model. But compare to non-custom caffe layer Core ML model, Core ML(including custom layer) speed is slower. I have found that coremlc compile mlmodel, and fuse some layer, for example, conv+bn+scale, conv+relu.
conv+bn+scale fusedconv+relu fused
I used instrument(time profiler) to perform the Core ML model, and found there was more functions(time-consumed) (custom layer Core ML model) than (non-custom layer Core ML model), everyone know why?
Core ML model time profiler on CPU

Related

TF Lite optimization

I tried to convert my Pytorch models to TensorFlow Lite with ONNX. But my inference time from TensorFlow Lite is twice as slow as Tensorflow and Pytorch. I run TensorFlow Lite model in google colab and this is my first time using TensorFlow Lite.
Here is my code to convert from Tensorflow to TensorFlow Lite:
converter = tf.lite.TFLiteConverter.from_saved_model("model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
model_lite = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(model_lite)
Any suggestions will help me a lot.
TensorFlow Lite models are supposed to run fast on embedded devices. So you have to use it inside an android phone to find out the time. Colab notebook will not give you the correct time.
You can also use benchmark tool to measure inference time of steady state.
If you would like to run inference on a PC or Google Colab, I'd recommend OpenVINO. OpenVINO is optimized for Intel hardware but it should work with any CPU. It optimizes the inference performance by e.g. graph pruning and fusing some operations. Here are the performance benchmarks for PyTorch models, among others.
You can find a full tutorial on how to convert the PyTorch model here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[pytorch,onnx]
Save your model to ONNX
OpenVINO cannot convert the PyTorch model directly for now but it can do it with the ONNX model. This sample code assumes the model is for computer vision.
dummy_input = torch.randn(1, 3, IMAGE_HEIGHT, IMAGE_WIDTH)
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=11)
Use Model Optimizer to convert ONNX model
The Model Optimizer is a command line tool that comes from OpenVINO Development Package so be sure you have installed it. It converts the ONNX model to OV format (aka IR), which is a default format for OpenVINO. It also changes the precision to FP16 (to further increase performance). Run in a command line:
mo --input_model "model.onnx" --input_shape "[1, 3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP16 --output_dir "model_ir"
Run the inference on the CPU
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
It's worth mentioning that Runtime can process the ONNX model directly. In that case, just skip the conversion (Model Optimizer) step and give onnx path to the read_model function.
Disclaimer: I work on OpenVINO.

Backbone network in Object detection

I am trying to understand the training process of a object deetaction deeplearng algorithm and I am having some problems understanding how the backbone network (the network that performs feature extraction) is trained.
I understand that it is common to use CNNs like AlexNet, VGGNet, and ResNet but I don't understand if these networks are pre-trained or not. If they are not trained what does the training consist of?
We directly use a pre-trained VGGNet or ResNet backbone. Although the backbone is pre-trained for classification task, the hidden layers learn features which can be used for object detection also. Initial layers will learn low level features such as lines, dots, curves etc. Next layer will learn learn high-level features that are built on top of low-level features to detect objects and larger shapes in the image.
Then the last layers are modified to output the object detection coordinates rather than class.
There are object detection specific backbones too. Check these papers:
DetNet: A Backbone network for Object Detection
CBNet: A Novel Composite Backbone Network Architecture for Object Detection
DetNAS: Backbone Search for Object Detection
High-Resolution Network: A universal neural architecture for visual recognition
Lastly, the pretrained weights will be useful only if you are using them for similar images. E.g.: weights trained on Image-net will be useless on ultrasound medical image data. In this case we would rather train from scratch.

how to serve deep learning model without GPU

Due to the cost-saving, I'm running a deep learning model with a regular CPU. It takes 10 seconds to finish a request and it's written in python.
I'm thinking about to improve the perf by using java, C++, or rust. Is there any existing rust framework to pick a deep learning model.
Is there any existing rust framework to pick a deep learning model.
While I am not familiar with rust framework. If you are running you model on intel cpu, I would suggest to export model using ONNX and run it with mxnet with Intel MKLDNN backend. This should give you the most performance as it uses Intel MKLDNN and Intel MKL library. You can use C++/Python.
Install mxnet with MKLDNN
https://mxnet.apache.org/versions/1.6/api/python/docs/tutorials/performance/backend/mkldnn/mkldnn_readme.html
Tensorflow's performance critical parts are written in C++. Using other language won't cause a drastic performance difference. You may Quantize your network or do a Network Pruning to increase performance.

Is there any Inference engine for CNN on Vivante GC2000 GPU with OpenCL 1.1 EP?

I have iMX6 quad, consists of Vivante GC2000 GPU provides OpenCL 1.1 EP implementation. I have a trained Convolution neural network model that I just want to run on iMX6-GPU with OpenCL 1.1 EP. I have tried to cross-compile TensorFlow for it, but it's not working. I want to know if there is any Open Source project or any commuunity available that provides inference engine to run CNN on OpenCL 1.1 EP.

Is the xception model in keras the best model was describe in the paper?

I read the Xception paper and in this paper it was mentioned in part 4.7 that best results are achivable without any activation. Now I want to use this network on videos using keras toolbox but the model in keras uses 'ReLU' activation function. Does the model in keras returns best model or it is better to omit the relu layers?
You are confusing normal activations used for convolutional and dense layers, with the ones mentioned in the paper. Section 4.7 only deals with varying the activation between depth-wise and point-wise convolutions, the rest of the activations in the architecture are kept unchanged.