Stacking/chaining CNNs for different use cases - deep-learning

So I'm getting more and more into deep learning using CNNs.
I was wondering if there are examples of "chained" (I don't know what the correct term would be) CNNs - what I mean by that is, using e.g. a first CNN to perform a semantic segmentation task while using its output as input for a second CNN which for example performs a classification task.
My questions would be:
What is the correct term for this sequential use of neural networks?
Is there a way to pack multiple networks into one "big" network which can be trained in one a single step instead of training 2 models and combining them.
Also if anyone could maybe provide a link so I could read about that kind of stuff, I'd really appreciate it.
Thanks a lot in advance!

Sequential use of independent neural networks can have different interpretations:
The first model may be viewed as a feature extractor and the second one is a classifier.
It may be viewed as a special case of stacking (stacked generalization) with a single model on the first level.
It is a common practice in deep learning to chain multiple models together and train them jointly. Usually it calls end-to-end learning. Please see the answer about it: https://ai.stackexchange.com/questions/16575/what-does-end-to-end-training-mean

Related

Creating a dataset of images for object detection for extremely specific task

Even though I am quite familiar with the concepts of Machine Learning & Deep Learning, I never needed to create my own dataset before.
Now, for my thesis, I have to create my own dataset with images of an object that there are no datasets available on the internet(just assume that this is ground-truth).
I have limited computational power so I want to use YOLO, SSD or efficientdet.
Do I need to go over every single image I have in my dataset by my human eyes and create bounding box center coordinates and dimensions to log them with their labels?
Thanks
Yes, you will need to do that.
At the same time, though the task is niche, you could benefit from the concept of transfer learning. That is, you can use a pre-trained backbone in order to help your model to learn faster/achieve better results/need fewer annotations example, but you will still need to annotate the new dataset on your own.
You can use software such as LabelBox, as a starting point, it is very good since it allows you to output the format in Pascal(VOC) format, YOLO and COCO format, so it is a matter of choice/what is more suitable for you.

Where to find deep learning based prediction model

I need to find a deep learning based prediction model, where can I find it?
You can use Pytorch and Tensorflow pretrained models.
https://pytorch.org/docs/stable/torchvision/models.html
They can be automatically downloaded. There are some sample codes, that you can try:
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
If you are interested in deep learning, I suggest you review the basics of it in cs231n stanford. Your question is a bit odd, because you first need to define your task specifically. Prediction is not a good description. You could look for models for classification, segmentation, object detection, sequence2sequence(like translation), and so on...
Then you need to know how to search through projects on github, and then you need to know python (in most cases), and then use a pretrained model or use your own dataset to train or fine-tune the model for that task. Then you could pray that you have found a good model for your task, after that you need to validate the results on a test set. However, implementation of a model for real-life scenarios is another thing that you need to consider many other things, and you usually need some online-learning strategy, like Federated Learning. I hope that I could help you.

Deep Learning methods for Text Generation (PyTorch)

Greetings to everyone,
I want to design a system that is able to generate stories or poetry based on a large dataset of text, without being needed to feed a text description/start/summary as input at inference time.
So far I did this using RNN's, but as you know they have a lot of flaws. My question is, what are the best methods to achieve this task at the time?
I searched for possibilities using Attention mechanisms, but it turns out that they are fitted for translation tasks.
I know about GPT-2, Bert, Transformer, etc., but all of them need a text description as input, before the generation and this is not what I'm seeking. I want a system able to generate stories from scratch after training.
Thanks a lot!
edit
so the comment was: I want to generate text from scratch, not starting from a given sentence at inference time. I hope it makes sense.
yes, you can do that, that's just simple code manipulation on top of the ready models, be it BERT, GPT-2 or LSTM based RNN.
How? You have to provide random input to the model. Such random input can be randomly chosen word or phrase or just a vector of zeroes.
Hope it helps.
You have mixed up several things here.
You can achieve what you want either using LSTM based or transformer based architecture.
When you said you did it with RNN, you probably mean that you have tried LSTM based sequence to sequence model.
Now, there is attention in your question. So you can use attention to improve your RNN but it is not a required condition. However, if you use transformer architecture, then it is built in the transormer blocks.
GPT-2 is nothing but a transformer based model. Its building block is a transformer architecture.
BERT is also another transformer based architecture.
So to answer your question, you should and can try using LSTM based or transformer based architecture to achieve what you want. Sometimes such architecture is called GPT-2, sometimes BERT depending on how it is realized.
I encourage you to read this classic from Karpathy, if you understand it then you have cleared most of your questions:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

How to apply model free deep reinforcement learning when the access to the real environment is hard?

Deep Reinforcement Learning can be very useful in applying it to real-world problems which have highly dynamic nature. Few examples can be listed as is finance, healthcare etc. But when it comes to these kinds of problems it is hard to have a simulated environment. So what are the possible things to do?
Let me first comment a couple concepts trying to give you future research directions according to your comments:
Probably the term "forecast" is not suitable to describe the kind of problems solved by Reinforcement Learning. In some sense, RL needs to do an internal forecast process to choose the best actions in the long term. But the problem solved is an agent choosing actions in an environment. So, if your problem is a forecast problem, maybe other techniques are more suitable than RL.
Between tabular methods and deep Q-learning there exist many other methods that maybe are more suitable to your problem. They are probably less powerful but easy to use (more stable, less parameter tuning, etc.) You can combine Q-learning with other function approximators (simpler than a deep neural network). In general, the best choice is the simplest one able to solve the problem.
I don't know how to simulate the problem of human activities with first-person vision. In fact, I don't fully understand the problem setup.
And regarding to the original question of applying RL without accessing a simulated environment, as I previously said in the comments, if you have enough data, you could probably apply an RL algorithm. I'm assuming that you can store data from your environment but you can not easily interact with it. This is typical, for example, in medical domains where there exist many data about [patient status, treatment, next patient status], but you can not interact with patients by applying random treatments. In this situation, there are some facts to take into account:
RL methods generally consume a very large quantity of data. This is specially true when combined with deep nets. How much data it is necessary depends totally of the problem, but be ready to store millions of tuples [state, action, next state] if your environment is complex.
The stored tuples should be collected with a policy which contains some exploratory actions. RL algorithm will try to find the best possible actions among the ones contained in the data. If the agent can interact with the environment, it should choose exploratory actions to find the best one. Similarly, if the agent cannot interact and instead the data is gathered in advance, this data also should contain exploratory actions. The papers Neural Fitted Q Iteration - First Experiences
with a Data Efficient Neural Reinforcement
Learning Method and Tree-Based Batch Mode Reinforcement Learning could be helpful to understand these concepts.

Generate photos based on over 1M protos processed by ourselves before

We are running a huge team that process child photos for our customers, the team processes over 1M photos per year.
The process includes basic tuning of light, resize, apply some filters to make the skin looks better.
We want to use deep learning to complete the jobs as much as possible. Which means I want to choose one model and train that model using our existing data. And then use the trained model to generate photos by inputing the new unprocessed photos.
Is there existing model that I can make use of, or any papers have covered this scenario?
Any help would be appreciated, thanks!
You could try something like this: https://arxiv.org/pdf/1412.7725.pdf. But with deep learning and your amount of training data you can problem get any big enough model to work well.
Image generation is not what you should search for. Image generation means that an image is generated (almost) completely from nothing. You want to enhance an existing image.
Although I haven't read any papers about this scenario so far, searching for "image enhancement neural network" reveald several promising results:
A Survey on Image Enhancement Techniques: Classical Spatial Filter, Neural Network, Cellular Neural Network, and Fuzzy Filter: http://ieeexplore.ieee.org/document/4237993/
A new class of nonlinear filters for image enhancement: http://ieeexplore.ieee.org/document/150915/
An image enhancement technique combining sharpening and noise reduction: http://ieeexplore.ieee.org/document/1044761/
I guess you could do the following:
Create a CNN model. The only "special" thing of this model is that it does not have a fully connected layer as target, but another (3 channel) image. You have to adjust the error function to this. (Similar to semantic segmentation).