Convolutional Layers for non-image data - deep-learning

I often see guides and examples using Convolutional Layers when implementing Deep Q-Networks. This makes sense for some scenarios, typically where you do not have access to the state in for example an array representation.
In my case, I have a game environment which gives me complete access to the state, in form of a 2D array. This 2D array is later interpreted by a graphics engine and dawn to the screen.
I have been recommended to use Convolutional Layers for interpreting images, but I have yet to see any recommendations about flattening the 2D State representation directly and utilize dense layers instead.
Does it make any sense to use Convolutional Networks/Layers for data which are not an image?

Related

PyTorch Geometric GCN Autoencoder with Flat Latent Space

I have a problem in which I have a series of observations, each of which is a graph of the same structure, but with different node features. I would like to learn a flat embedding of each graph of size 32x1.
My thought was to do this with an autoencoder. This would take the input graph, apply some graph convolutions, use a dense layer to map the graph to a 32x1 latent space, and then reconstruct the graph (using the same common structure) before applying a few more convolutions.
As far as I am aware, this is in contrast to the typical graph autoencoder framework, in which the latent representation is a graph of the same structure as the input but with latent representations of each nodes' features.
For this reason, I am not sure how to implement such an architecture using PyTorch Geometric. Namely, I am unsure how I go from the flat latent space back to a graph.
Is this possible, and if so, roughly how would I do so?

Backbone network in Object detection

I am trying to understand the training process of a object deetaction deeplearng algorithm and I am having some problems understanding how the backbone network (the network that performs feature extraction) is trained.
I understand that it is common to use CNNs like AlexNet, VGGNet, and ResNet but I don't understand if these networks are pre-trained or not. If they are not trained what does the training consist of?
We directly use a pre-trained VGGNet or ResNet backbone. Although the backbone is pre-trained for classification task, the hidden layers learn features which can be used for object detection also. Initial layers will learn low level features such as lines, dots, curves etc. Next layer will learn learn high-level features that are built on top of low-level features to detect objects and larger shapes in the image.
Then the last layers are modified to output the object detection coordinates rather than class.
There are object detection specific backbones too. Check these papers:
DetNet: A Backbone network for Object Detection
CBNet: A Novel Composite Backbone Network Architecture for Object Detection
DetNAS: Backbone Search for Object Detection
High-Resolution Network: A universal neural architecture for visual recognition
Lastly, the pretrained weights will be useful only if you are using them for similar images. E.g.: weights trained on Image-net will be useless on ultrasound medical image data. In this case we would rather train from scratch.

What is the ideal steps in using predicted segmentation masks for watershed post processing?

I am experimenting with object segmentation(round shaped objects that are often occur close together). I have used UNET deep neural network architecture for segmentation and obtained segmentation masks. I saved those in npy format.
I am a beginner in this area. I would like to know the ideal steps that I should follow now, if I want to apply watershed on the predicted masks with the aim of separating the objects.
I guess I need to convert the binary mask predicted to some form so that I can obtain some kind of markers indicating centroids.
Please help

Can Overfeat work on ResNet or Inception network architectures

I am familiar with the principal how Overfeat works to not only classify but also localize an object in an image by only using convolutional layers instead of fully connected layers at the end. However, each tutorial or explanation that I read talks about alexnet or a very basic neural network consisting of a few consecutive convolutional layers followed by 2-3 Fully connected layers to classify an image. However my question goes as follow, is it possible to modify a more complex network such as ResNet or Inception which don't use the standard consecutive convolutional layer techniques as in Alexnet or VGG?
Thanks
Welcome, and yes. Looking at a very simplified diagram like this, everything to the left of the split "FC" ('fully connected', or 'dense') arrows can be any kind of (what is typically called an) image classification network, such as those in Keras Applications, which includes VGG, ResNet, Inception, Xception, etc. For these kinds of networks, the input is obviously an image, and the output is sometimes called a 'feature map' (although that's a bit silly---have a look at the output and you'll understand---as it's typically far more akin to a post-modernist map than to a cartographic one).
So the answer to your question is yes: put any kind of network you want before the 'overfeat' ending thing, whether custom or otherwise, but know that it's intended to be some general convolutional reductionist model like ResNet, Inception, etc. Any kind of network that takes an image in and spits out a pooled or flattened (1 dimensional) form of a 'feature map' of 3 dimensions is what's apparently intended for this 'overfeat' concept.

How to design the Embedding Layer in Neural Network in order to have a better quality?

Recently I am learning about the ideal about the embedding layer in neural networks. The best explanation I found so far is here The explanation there well addressed the core concept of why to use embedding layer and how it works.
It also mentioned that our embedding will map similar words to similar region. And thus the quality of our embedding representation is how close or similar that a group of similar representations from original space is in embedding space. But I really have no ideal of how to do it.
My question is, how to design the weight matrix in order to have a better embedding representation that is customised for specific dataset ?
Any hint would be really helpful to me!
Thank you all!
Suppose you know some concepts of neural networks and Word2Vec, I try to explain things briefly.
1, the weight matrix in the embedding layer is often randomly initialized just like weights in other types of neural networks layers.
2, the weight matrix in the embedding layer transforms the sparse input into a dense vector as explained in the post you mentioned.
3, the weight matrix in the embedding layer can be updated during the training process using your dataset along the backpropagation.
Therefore, after training, the learned weight matrix should give you better representations of your specific data. Just like how word embedding works, more data often yields better representations in the embedding layer. Another factor is the number of dimension(generally speaking, the higher dimension, the more degrees of freedom the model will have to learn the representations of the features).