How to use previously generated topic-word distribution matrix for the new LDA topic generation process? - lda

Let's say that we have executed the LDA topic generation process (with Gibbs sampling) once. Now for the next round of LDA topic generation, how to make use of the already existing topic matrix? Does any library support this kind of feature?

Related

Number of parameters and FLOPS in ONNX and TensorRT model

Does number of parameters and FLOPS (float operations per second) change when convert a model from PyTorch to ONNX or TensorRT format?
I don't think Anvar's post answered OP's question thoroughly so I did a little bit of research. Some general info before the answers to the questions as I believe OP hasn't understood fully what TensorRT and ONNX optimizations happen during the conversion from PyTorch format.
Both conversions, Pytorch to ONNX and ONNX to TensorRT increase the performance of the model by using several different optimizations. The tools actually print you information about what they do if you choose the verbose flag for them.
The preferred way to convert a Pytorch model to TensorRT is to use Torch-TensorRT as explained here.
TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU.
ONNX runtime offers mostly graph optimizations such as graph simplifications and node fusions to improve performance.
1. Does the number of parameters change when converting a PyTorch model to ONNX or TensorRT?
No: even though the layers are fused the number of parameters does not decrease unless there are some redundant branches in the model.
I tested this by downloading the yolov5s.onnx model here. The original model has 7.2M parameters according to the repository authors. Then I used this tool to count the number of parameters in the yolov5.onnx model and got 7225917 as a result. Thus, onnx conversion did not reduce the amount of parameters.
I was not able to get as elaborate information for TensorRT model but you can get layer information using trtexec. There is a recent question about this but there are no answers yet.
2. Does the number of FLOPS change when converting a PyTorch model to ONNX or TensorRT?
According to this post, no.
I know that since some of new versions of Pytorch (I used 1.8 and it worked for me) there are some fusions of batch norm layers and convolutions while saving model. I'm not sure about ONNX, but TensorRT actively uses horizontal and vertical fusion of different layers, so final model would be computational cheaper, than model that you initialized.

Dataset equivalent in Julia Flux

I want to use Flux to train a Deep Learning model on audio files. In Flux documentation, they passed the whole data array (with all examples) to a dataloader that would feed the train!() function with a list of batches. The point is that I have not enough memory in my system to load all audio files at once.
In PyTorch, the dataloader would be fed by a dataset object that would have the logic to open one file at a time on the __getitem__() method.
So, what is the right way to implement it in Flux/Julia, what is the Torch dataset equivalent?
I found this thread on Julia discourse forum that covers basically what I am asking in this question.
https://discourse.julialang.org/t/pytorch-dataloader-equivalent-for-training-large-models-with-flux/30763
From some recommendations o the topic, there is this one, the package MLDataUtils.jl, that offer similar functionality with the nobs() and getobs() functions.

Fitting step in Deep Q Network

I am confused why dqn with experience replay algorithm would perform gradient descent step for every step in a given episode? This will fit only one step, right? This would make it extremely slow. Why not after each episode ends or every time the model is cloned?
In the original paper, the author pushes one sample to the experience replay buffer and randomly samples 32 transitions to train the model in the minibatch fashion. The samples took from interacting with the environment is not directly feeding to the model. To increase the speed of training, the author store samples every step but updates the model every four steps.
Use OpenAI's Baseline project; this single-process method can master easy games like Atari Pong (Pong-v4) about 2.5 hours using a single GPU. Of course, training in this kind of single process way makes multi-core, multi-GPU (or single-GPU) system's resource underutilised. So in new publications had decoupled action-selection and model optimisation. They use multiple "Actors" to interact with environments simultaneously and a single GPU "Leaner" to optimise the model or multiple Leaners with multiple models on various GPUs. The multi-actor-single-learner is described in Deepmind's Apex-DQN (Distributed Prioritized Experience Replay, D. Horgan et al., 2018) method and the multi-actor-multi-learner described in (Accelerated Methods for Deep Reinforcement Learning, Stooke and Abbeel, 2018). When using multiple learners, the parameter sharing across processes becomes essential. The old trail described in Deepmind's PDQN (Massively Parallel Methods for Deep Reinforcement Learning, Nair et al., 2015) which was proposed in the period between DQN and A3C. However, the work was performed entirely on CPUs, so it looks using massive resources, the result can be easy outperformed by PPAC's batched action-selection on GPU method.
You can't optimise on each episode end, because the episode length isn't fixed, the better the model usually results in the longer episode steps. The model's learning capability will decrease when they perform a little better. The learning progress will be instable.
We also don't train the model only on target model clone, because the introduction of the target is to stabilise the training process by keeping an older set of parameters. If you update only on parameter clones, the target model's parameters will be the same as the model and this cause instability. Because, if we use the same parameters, one model update will cause the next state to have a higher value.
In Deepmind's 2015 Nature paper, it states that:
The second modification to online Q-learning aimed at further improving the stability of our method with neural networks is to use a separate network for generating the target yj in the Q-learning update. More precisely, every C updates we clone the network Q to obtain a target network Q' and use Q' for generating the Q-learning targets yj for the following C updates to Q.
This modification makes the algorithm more stable compared to standard online Q-learning, where an update that increases Q(st,at) often also increases Q(st+1, a) for all a and hence also increases the target yj, possibly leading to oscillations or divergence of the policy. Generating the targets using the older set of parameters adds a delay between the time an update to Q is made and the time the update affects the targets yj, making divergence or oscillations much more unlikely.
Human-level control through deep reinforcement
learning, Mnih et al., 2015

LDA topic distribution at training process and inference process

I have a question about LDA, a popular topic modeling technique.
A LDA model is created from a certain training documents set.
Then, topic distribution over documents of a data set which is same with the one used for training process is inferred based on the LDA model.
In this case, is the topic distribution created at the training process same with the one created at the inference process?
I'm asking it, because I have tried plda. It does not output the topic distribution at the training process, but outputs at the inferring process. Thus, I think if created topic distributions are almost identical, I can use plda even it has no output at training process.

Topic models evaluation in Gensim

I've been experimenting with LDA topic modelling using Gensim. I couldn't seem to find any topic model evaluation facility in Gensim, which could report on the perplexity of a topic model on held-out evaluation texts thus facilitates subsequent fine tuning of LDA parameters (e.g. number of topics). It would be greatly appreciated if anyone could shed some light on how I can perform topic model evaluation in Gensim. This question has also been posted on metaoptimize.
Found the answer on the gensim mailing list.
In short, the bound() method of LdaModel computes a lower bound on perplexity, based on a held-out corpus.