How to convert datashader canvas.points to GeoTiff? - heatmap

I am interested on generating GeoTiff images given a set of scattered points, the dataset is normally between 500K points up to 2M points. I have been using datashader and geoviews to generate static images of the data and that works fine. However, when I want to generate the geotiff, there are two options:
To generate an RGB geotiff with the coloring as provided by datashader Image color map
To use the values inside the geotiff to create a colormap on the geotiff server.
We are interested on going for the second solution but I do not understand the values which I extract from the shade function:
import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
df = pd.DataFrame({"x": [0, 0.6, 1, 1.2, 2, 5], "y":[0, 0.6, 0.8, 1, 5, 6], "value":[1, 1, 1, 1, 0.5, 0.5]})
canvas = ds.Canvas(plot_width=100, plot_height=100, x_range=(0, 5), y_range=(0, 5), x_axis_type="linear", y_axis_type="linear")
shaded = tf.shade(canvas.points(df, "x", "y", ds.mean("value")))
As far as I know, I extract the aggregated values from shade as numpy and can use this array to save a GeoTiff with one band but the values into that array make no sense for me.
shaded.to_numpy().max()
4293318829
It is always huge numbers which have no relationship with the aggregation process.
So the questions are, what do those values actually mean?
Is it possible to get the actual aggregated values ?
Is it possible to extract the color mapping from an Image object so I can pass it to the server and they color the geotiff as I expect it to be colored?

So datashader uses a couple of steps to produce the results we get. The more important, there is a step in which the values are mapped to RGB using the color_map input. After that the colourize function inside datashader stack the 4 RGBA uint8 arrays of size (100, 100) into one uint32 (100, 100) array using this:
From datashader:
/transfer_functions/__init__.py
values = module.dstack([r, g, b, a]).view(module.uint32).reshape(a.shape)
That is why the values are so strange. To get the RGBA, one can access the data like this:
shaded.to_numpy().base.shape
(310, 400, 4)

All your questions can be found in the datashader documentation.
The actual aggregated values are stored in canvas.points(df, "x", "y", ds.mean("value")). This is a xarray object. You can write it to netcdf or geotiff easily. But in this way you will lose the geographic references/coordinate system information.
I suppose your dataset use epsg:3857 as your projection coordinate system. You can use the rioxarray package to add the geographic references/coordinate system information and write to geotiff (see https://corteva.github.io/rioxarray/latest/rioxarray.html#rioxarray.rioxarray.XRasterBase.write_crs). like this:
ag = canvas.points(df, "x", "y", ds.mean("value"))
ag.rio.write_crs("epsg:3857", inplace=True)
ag.rio.to_raster("../data/interim/test2.tif")
I think the color mapping should be the function of your server. To know what information can geotiff store, you can check here:https://www.bluemarblegeo.com/about-geotiff-format/.

Related

Do the multiple heads in Multi head attention actually lead to more parameters or different outputs?

I am trying to understand Transformers. While I understand the concept of the encoder-decoder structure and the idea behind self-attention what I am stuck at is the "multi head part" of the "MultiheadAttention-Layer".
Looking at this explanation https://jalammar.github.io/illustrated-transformer/, which I generally found very good, it appears that multiple weight matrices (one set of weight matrices per head) are used to transform the original input value into the query, key and value, which are then used to calculate the attention scores and the actual output of the MultiheadAttention layer. I also understand the idea of multiple heads to the individual attention heads can focus on different parts (as depicted in the link).
However, this seems to contradict other observations I have made:
In the original paper https://arxiv.org/abs/1706.03762, it is stated that the input is split into parts of equal size per attention head.
So, for example I have:
batch_size = 1
sequence_length = 12
embed_dim = 512 (I assume that the dimension for ```query```, ```key``` and ```value``` are equal)
Then the shape of my query, key and token would each be [1, 12, 512]
We assume we have five heads, so num_heads = 2
This results in a dimension per head of 512/2=256. According to my understanding this should result in the shape [1, 12, 256] for each attention head.
So, am I correct in assuming that this depiction https://jalammar.github.io/illustrated-transformer/ just does not display this factor appropriately?
Does the splitting of the input into different heads actually lead to different calculations in the layer or is it just done to make computations faster?
I have looked at the implementation in torch.nn.MultiheadAttention and printed out the shapes at various stages during the forward pass through the layer. To me it appears that the operations are conducted in the following order:
Use the in_projection weight matrices to get the query, key and value from the original inputs. After this the shape for query, key and value is [1, 12, 512]. From my understanding the weights in this step are the parameters that are actually learned in the layer during training.
Then the shape is modified for the multiple heads into [2, 12, 256].
After this the dot product between query and key is calculated, etc.. The output of this operation has the shape [2, 12, 256].
Then the output of the heads is concatenated which results in the shape [12, 512].
The attention_output is multiplied by the output projection weight matrices and we get [12, 1, 512] (The batch size and the sequence_length is sometimes switched around). Again here we have weights that are being trained inside the matrices.
I printed the shape of the parameters in the layer for different num_heads and the amount of the parameters does not change:
First parameter: [1536,512] (The input projection weight matrix, I assume, 1536=3*512)
Second parameter: [1536] (The input projection bias, I assume)
Third parameter: [512,512] (The output projection weight matrix, I assume)
Fourth parameter: [512] (The output projection bias, I assume)
On this website https://towardsdatascience.com/transformers-explained-visually-part-3-multi-head-attention-deep-dive-1c1ff1024853, it is stated that this is only a "logical split". This seems to fit my own observations using the pytorch implementation.
So does the number of attention heads actually change the values that are outputted by the layer and the weights learned by the model? The way I see it, the weights are not influenced by the number of heads.
Then how can multiple heads focus on different parts (similar to the filters in convolutional layers)?

How to create a Pytorch network with mixed categorical and continuous matrix input

I'm creating a network network that will take a matrix of continuous values along with some categorical input represented as vectors of all the classes.
Now, I'm also looking extract features from the matrix with convolution. But this would not be possible if I reduce the matrix to dimension 1 and concatenate with the class vectors.
Is there a way to concat it together as a single input? Or do I have to create two separate input layer then somehow join them after the convolution? If its the latter, what function am I looking for?
The most common approach to create continuous values from categorical data is nn.Embedding. It creates a learnable vector representation of the available classes, such that two similar classes (in a specific context) are closer to each other than two dissimilar classes.
When you have a vector of classes with size [v], the embedding would create a tensor of size [v, embedding_size], where each class is represented by a vector of length embedding_size.
num_classes = 4
embedding_size = 10
embedding = nn.Embedding(num_classes, embedding_size)
class_vector = torch.tensor([1, 0, 3, 3, 2])
embedded_classes = embedding(class_vector)
embedded_classes.size() # => torch.Size([5, 10])
How you combine them with your continuous matrix depends on your particular use case. If you just want a 1D vector you can flatten and concatenate them. On the other hand, if the matrix has meaningful dimensions that you want to keep, you should decide which dimension makes sense to concatenate on and adapt the embedding_size such that they can be concatenated.

One Hot Encoding dimension - Model Compexity

I will explain my problem:
I have around 50.000 samples, each of one described by a list of codes representing "events"
The number of unique codes are around 800.
The max number of codes that a sample could have is around 600.
I want to represent each sample using one-hot encoding. The representation should be, if we consider the operation of padding for those samples that has fewer codes, a 800x600 matrix.
Giving this new representation as input of a network, means to flatten each matrix to a vector of size 800x600 (460.000 values).
At the end the dataset should consist in 50.000 vectors of size 460.000 .
Now, I have two considerations:
How is it possible to handle a dataset of that size?(I tried data generator to obtain the representation on-the-fly but they are really slow).
Having a vector of size 460.000 as input for each sample, means that the complexity of my model( number of parameters to learn ) is extremely high ( around 15.000.000 in my case ) and, so, I need an huge dataset to train the model properly. Doesn't it?
Why do not you use the conventional model used in NLP?
These events can be translated as you say by embedding matrix.
Then you can represent the chains of events using LSTM (or GRU or RNN o Bilateral LSTM), the difference of using LSTM instead of a conventional network is that you use the same module repeated by N times.
So your input really is not 460,000, but internally an event A indirectly helps you learn about an event B. That's because the LSTM has a module that repeats itself for each event in the chain.
You have an example here:
https://www.kaggle.com/ngyptr/lstm-sentiment-analysis-keras
Broadly speaking what I would do would be the following (in Keras pseudo-code):
Detect the number of total events. I generate a unique list.
unique_events = list (set ([event_0, ..., event_n]))
You can perform the translation of a sequence with:
seq_events_idx = map (unique_events.index, seq_events)
Add the necessary pad to each sequence:
sequences_pad = pad_sequences (sequences, max_seq)
Then you can directly use an embedding to carry out the transfer of the event to an associated vector of the dimension that you consider.
input_ = Input (shape = (max_seq,), dtype = 'int32')
embedding = Embedding (len(unique_events),
                    dimensions,
                    input_length = max_seq,
                    trainable = True) (input_)
Then you define the architecture of your LSTM (For example):
lstm = LSTM (128, input_shape = (max_seq, dimensions), dropout = 0.2, recurrent_dropout = 0.2, return_sequences = True) (embedding)
Add the dense and the result you want:
out = Dense (10, activation = 'softmax') (lstm)
I think that this type of model can help you and give better results.

Defining a Keras function

I have recently started to learn Deep Learning and CNNs. I have come across the following code which defines a simple CNN.
Can anyone help me to understand how these lines work:
loss = layer_output[:, :, :, 0] - What is the result of this ? My question is that, the network has not been trained yet. Weights [Kernels] are not yet calculated. so, what data it is going to return !! Does 0 represent the first kernel ?
iterate = K.function([input_img], [loss, grads]) - There is not much documentation available on Keras site. What I understand is that iterate is a function which takes an Input tensor and returns a list of tensors, first one is loss and second one is grads. But, they are defined elsewhere !!
Define Input Image with these dimensions:
img_data = np.random.uniform(size=(1, 250, 250, 3))
There is a Simple CNN, which has one Convolutional layer. It uses two 3 X 3 kernels.
input = Input(shape=250, 250, 3,), name='input_1')
First_Conv2D = Conv2D(2, kernel_size=(3, 3), padding="same", name='conv2d_1', activation='relu')(input)
flat = Flatten(name='flatten_1')(First_Conv2D)
output = Dense(2, name='dense_1', activation='softmax')(flat)
model = Model(inputs=[input], outputs=[output])
layer_dict = dict([(layer.name, layer) for layer in model.layers[0:]])
layer_output = layer_dict['conv2d_1'].output
input_img = model.input
# Calculate loss and gradient.
loss = layer_output[:, :, :, 0]
grads = K.gradients(loss, input_img)[0]
# Define a Keras function
iterate = K.function([input_img], [loss, grads])
# Call iterate function
loss_value, grads_value = iterate([img_data])
Thank You.
This looks like a nasty dissection of Keras as an API. I reckon it leads to more confusion rather than an introduction to deep learning. Anyway, addressing your questions:
All tensors are symbolic meaning that until we run a session, they do not contain any values. They instead define a directed computation graph. The loss = layer_output[:,:,:,0] is an slicing operation that takes the first element of the last dimension returning another tensor with 3 dimensions. When you run the session with actual inputs, then the tensors will have values which these operations run. The operations are almost identical to NumPy ndarrays which are not symbolic and contain values, you can get an intuition.
K.function just glues the inputs to the outputs returning a single operation that when given the inputs it will follow the computation graph from the inputs to the defined outputs. In this case, given a list of single input it returns a list of 2 output tensors loss and gradients. These are still symbolic remember, if you try to print one you'll just get what it is and it's shape, data type.

Box n whisker from CSV with octave

I'm totally new to Octave but thought I'd give it a try since I need to create a box and whisker plot from a raster image with height values.
I've managed to export my GeoTIFF image to some sort of .CSV-file, it can be accessed from here and it uses "." for decimals and ";" as the delimiter between cells.
When I run dlmread ("test.csv",, ";", 0, 0) the results indicate that the data is split up in multiple columns? And on top of that I have zero-values (0) which isn't present in test.csv, see screenshot below from Octave:
First of all I was under the impression that to create a box and whisker plot I needed to have the data in one column, not a couple of hundred like in this case. And secondly; what am I doing wrong when I'm getting all these zeroes?
Could someone point out how to properly import the above CSV to octave. And if you feel really generous I would be so thankful if you also could help me to create a box and whisker plot from the attached data.
I'm using Octave 4.2.1 x86_64 on Windows 10 home.
It's really difficult to figure out what you really want and it might be much easier to use the GeoTIFF directly without needing to go through multiple (yet obscure) conversions.
But here is a wild guess:
pkg load statistics
s = urlread ("https://drive.google.com/uc?export=download&id=1RzJ-EO0OXgfMmMRG8wiCBz-51RcwSM5h");
o = str2double (strsplit (s, ";"));
o(isnan (o)) = [];
subplot (3, 1, 1);
plot (o)
grid on
subplot (3, 1, 2);
hist (o, 100);
subplot (3, 1, 3);
boxplot (o)
print out.png
gives you the raw data, the histogram and a boxplot with center, spread, departure from symmetry and whiskers: