LDA topic distribution at training process and inference process

LDA topic distribution at training process and inference process - lda

I have a question about LDA, a popular topic modeling technique.
A LDA model is created from a certain training documents set.
Then, topic distribution over documents of a data set which is same with the one used for training process is inferred based on the LDA model.
In this case, is the topic distribution created at the training process same with the one created at the inference process?
I'm asking it, because I have tried plda. It does not output the topic distribution at the training process, but outputs at the inferring process. Thus, I think if created topic distributions are almost identical, I can use plda even it has no output at training process.

Related

Number of parameters and FLOPS in ONNX and TensorRT model

Does number of parameters and FLOPS (float operations per second) change when convert a model from PyTorch to ONNX or TensorRT format?

I don't think Anvar's post answered OP's question thoroughly so I did a little bit of research. Some general info before the answers to the questions as I believe OP hasn't understood fully what TensorRT and ONNX optimizations happen during the conversion from PyTorch format.
Both conversions, Pytorch to ONNX and ONNX to TensorRT increase the performance of the model by using several different optimizations. The tools actually print you information about what they do if you choose the verbose flag for them.
The preferred way to convert a Pytorch model to TensorRT is to use Torch-TensorRT as explained here.
TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU.
ONNX runtime offers mostly graph optimizations such as graph simplifications and node fusions to improve performance.
1. Does the number of parameters change when converting a PyTorch model to ONNX or TensorRT?
No: even though the layers are fused the number of parameters does not decrease unless there are some redundant branches in the model.
I tested this by downloading the yolov5s.onnx model here. The original model has 7.2M parameters according to the repository authors. Then I used this tool to count the number of parameters in the yolov5.onnx model and got 7225917 as a result. Thus, onnx conversion did not reduce the amount of parameters.
I was not able to get as elaborate information for TensorRT model but you can get layer information using trtexec. There is a recent question about this but there are no answers yet.
2. Does the number of FLOPS change when converting a PyTorch model to ONNX or TensorRT?
According to this post, no.

I know that since some of new versions of Pytorch (I used 1.8 and it worked for me) there are some fusions of batch norm layers and convolutions while saving model. I'm not sure about ONNX, but TensorRT actively uses horizontal and vertical fusion of different layers, so final model would be computational cheaper, than model that you initialized.

How to do Transfer Learning with LSTM for time series forecasting?

I am working on a project about time-series forecasting using LSTMs layers. The dataset used for training and testing the model was collected among 443 persons which worn a sensor that samples a physical variable ( 1 variable/measure) every 5 minutes, for each patient there are around 5000 records/readings.
Although, I can train and test my model under different scenarios, I am troubled finding information about how to apply transfer learning in such an architecture. I mean, I understand I can use inductive transfer-learning by copying the matrix-weights from the general model onto a secondary model (unknown person), then after I can re-train this model with specific data and evaluate the result.
But I would like to know if somebody knows other ways to apply transfer-learning on this type of architecture or where to find information about it since there aren't many scientific papers talking about it, mostly they talk about NLP and other type of application but time series?
Cheers X )

What are backend weights in deep learning models (yolo)?

pretty new to deep learning, but couldn't seem to find/figure out what are backend weights such as
full_yolo_backend.h5
squeezenet_backend.h5
From what I have found and experimented, these backend weights have fundamentally different model architectures such as
yolov2 model has 40+ layers but the backend only 20+ layers (?)
you can build on top of the backend model with your own networks (?)
using backend models tend to yield poorer results (?)
I was hoping to seek some explanation on backend weights vs actual models for learning purposes. Thank you so much!

I'm note sure which implementation you are using but in many applications, you can consider a deep model as a feature extractor whose output is more or less task-agnostic, followed by a number of task-specific heads.
The choice of backend depends on your specific constraints in terms of tradeoff between accuracy and computational complexity. Examples of classical but time-consuming choices for backends are resnet-101, resnet-50 or VGG that can be coupled with FPN (feature pyramid networks) to yield multiscale features. However, if speed is your main concern then you can use smaller backends such as different MobileNet architectures or even the vanilla networks such as the ones used in the original Yolov1/v2 papers (tinyYolo is an extreme case).
Once you have chosen your backend (you can use a pretrained one), you can load its weights (that is what your *h5 files are). On top of that, you will add a small head that will carry the tasks that you need: this can be classification, bbox regression, or like in MaskRCNN forground/background segmentation. For Yolov2, you can just add very few, for example 3 convolutional layers (with non-linearities of course) that will output a tensor of size
BxC1xC2xAxP
#B==batch size
#C1==number vertical of cells
#C2==number of horizontal cells
#C3==number of anchors
#C4==number of parameters (i.e. bbx parameters, class prediction, confidence)
Then, you can just save/load the weights of this head separately. When you are happy with your results though, training jointly (end-to-end) will usually give you a small boost in accuracy.
Finally, to come back to your last questions, I assume that you are getting poor results with the backends because you are only loading backend weights but not the weights of the heads. Another possibility is that you are using a head trained with a backends X but that you are switching the backend to Y. In that case since the head expects different features, it's natural to see a drop in performance.

Deep Learning Sequence 2 Sequence models

I have a general question concerning seq2seq models. There are lots of Open Source Tools like TensorFlow, Torch and others. But I did not find an answer for my question:
Is it possible to add training data to a once trained model without starting the whole training process from the beginning?

How to use previously generated topic-word distribution matrix for the new LDA topic generation process?

Let's say that we have executed the LDA topic generation process (with Gibbs sampling) once. Now for the next round of LDA topic generation, how to make use of the already existing topic matrix? Does any library support this kind of feature?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008