How to calculate separable convolutions in the Xception paper? - deep-learning

I read the Xception paper (there's even a Keras Model for the describe NN) and it talks about separable convolutions.
I was trying to understand how exactly they are calculated. Rather than leaving it to imprecise words, I have included the piece of pseudo-code below that summarizes my understanding. The code maps from a feature map 18x18x728 to a 18x18x1024 one :
XSIZE = 18;
YSIZE = 18;
ZSIZE = 728;
ZSIXE2 = 1024;
float mapin[XSIZE][YSIZE][ZSIZE]; // Input map
float imap[XSIZE][YSIZE][ZSIZE2]; // Intermediate map
float mapout[XSIZE][YSIZE][ZSIZE2]; // Output map
float wz[ZSIZE][ZSIZE2]; // Weights for 1x1 convs
float wxy[3][3][ZSIZE2]; // Weights for 3x3 convs
// Apply 1x1 convs
for(y=0;y<YSIZE;y++)
for(x=0;x<XSIZE;x++)
for(o=0;o<ZSIZE2;o++){
s=0.0;
for(z=0;z<ZSIZE;z++)
s+=mapin[x][y][z]*wz[z][o];
imap[x][y][o]=s;
}
// Apply 2D 3x3 convs
for(o=0;o<ZSIZE2;o++)
for(y=0y<YSIZE;y++)
for(x=0;x<XSIZE;x++){
s=0.0;
for(i=-1;i<2;i++)
for(j=-1;j<2;j++)
s+=imap[x+j][y+i][o]*wxy[j+1][i+1][o]; // This value is 0 if falls off the edge
mapout[x][y][o]=s;
}
Is this correct ? If not, can you suggest fixes similarly written in C or pseudo-C ?
Thank you very much in advance.

I found tf.nn.separable_conv2d in Tensorflow that does exactly this. So I built a very simple graph and, with the help of random numbers, I tried to get the code above to match the result. The correct code is :
XSIZE = 18;
YSIZE = 18;
ZSIZE = 728;
ZSIXE2 = 1024;
float mapin[XSIZE][YSIZE][ZSIZE]; // Input map
float imap[XSIZE][YSIZE][ZSIZE]; // Intermediate map
float mapout[XSIZE][YSIZE][ZSIZE2]; // Output map
float wxy[3][3][ZSIZE]; // Weights for 3x3 convs
float wz[ZSIZE][ZSIZE2]; // Weights for 1x1 convs
// Apply 2D 3x3 convs
for(o=0;o<ZSIZE;o++)
for(y=0y<YSIZE;y++)
for(x=0;x<XSIZE;x++){
s=0.0;
for(i=-1;i<2;i++)
for(j=-1;j<2;j++)
s+=mapin[x+j][y+i][o]*wxy[j+1][i+1][o]; // This value is 0 if falls off the edge
imap[x][y][o]=s;
}
// Apply 1x1 convs
for(y=0;y<YSIZE;y++)
for(x=0;x<XSIZE;x++)
for(o=0;o<ZSIZE2;o++){
s=0.0;
for(z=0;z<ZSIZE;z++)
s+=imap[x][y][z]*wz[z][o];
mapout[x][y][o]=s;
}
The main difference is in the order that the two groups of convolutions are performed.
To my surprise, the order is important even when ZSIZE==ZSIZE2.

Related

3D binary image to 3D mesh using itk

I'm trying to generate a 3d mesh using 3d RLE binary mask.
In itk, I find a class named itkBinaryMask3DMeshSource
it's based on MarchingCubes algorithm
some example, use this class, ExtractIsoSurface et ExtractIsoSurface
in my case, I have a rle 3D binary mask but represented in 1d vector format.
I'm writing a function for this task.
My function takes as parameters :
Inputs : crle 1d vector ( computed rle), dimension Int3
Output : coord + coord indices ( or generate a single file contain both of those array; and next I can use them to visualize the mesh )
as a first step, I decoded this computed rle.
next, I use imageIterator to create an image compatible to BinaryMask3DMeshSource.
I'm blocked, in the last step.
This is my code :
void GenerateMeshFromCrle(const std::vector<int>& crle, const Int3 & dim,
std::vector<float>* coords, std::vector<int>*coord_indices, int* nodes,
int* cells, const char* outputmeshfile) {
std::vector<int> mask(crle.back());
CrleDecode(crle, mask.data());
// here we define our itk Image type with a 3 dimension
using ImageType = itk::Image< unsigned char, 3 >;
ImageType::Pointer image = ImageType::New();
// an Image is defined by start index and size for each axes
// By default, we set the first start index from x=0,y=0,z=0
ImageType::IndexType start;
start[0] = 0; // first index on X
start[1] = 0; // first index on Y
start[2] = 0; // first index on Z
// until here, no problem
// We set the image size on x,y,z from the dim input parameters
// itk takes Z Y X
ImageType::SizeType size;
size[0] = dim.z; // size along X
size[1] = dim.y; // size along Y
size[2] = dim.x; // size along Z
ImageType::RegionType region;
region.SetSize(size);
region.SetIndex(start);
image->SetRegions(region);
image->Allocate();
// Set the pixels to value from rle
// This is a fast way
itk::ImageRegionIterator<ImageType> imageIterator(image, region);
int n = 0;
while (!imageIterator.IsAtEnd() && n < mask.size()) {
// Set the current pixel to the value from rle
imageIterator.Set(mask[n]);
++imageIterator;
++n;
}
// In this step, we launch itkBinaryMask3DMeshSource
using BinaryThresholdFilterType = itk::BinaryThresholdImageFilter< ImageType, ImageType >;
BinaryThresholdFilterType::Pointer threshold =
BinaryThresholdFilterType::New();
threshold->SetInput(image->GetOutput()); // here it's an error, since no GetOutput member for image
threshold->SetLowerThreshold(0);
threshold->SetUpperThreshold(1);
threshold->SetOutsideValue(0);
using MeshType = itk::Mesh< double, 3 >;
using FilterType = itk::BinaryMask3DMeshSource< ImageType, MeshType >;
FilterType::Pointer filter = FilterType::New();
filter->SetInput(threshold->GetOutput());
filter->SetObjectValue(1);
using WriterType = itk::MeshFileWriter< MeshType >;
WriterType::Pointer writer = WriterType::New();
writer->SetFileName(outputmeshfile);
writer->SetInput(filter->GetOutput());
}
any idea
I appreciate your time.
Since image is not a filter you can plug it in directly: threshold->SetInput(image);. At the end of this function, you also need writer->Update();. The rest looks good.
Side-note: it looks like you might benefit from usage of import filter instead of manually iterating the buffer and copying values one at a time.

In Starling, how do you transform Filters to match the target Sprite's rotation & position?

Let's say your Starling display-list is as follows:
Stage
|___MainApp
|______Canvas (filter's target)
Then, you decide your MainApp should be rotated 90 degrees and offset a bit:
mainApp.rotation = Math.PI * 0.5;
mainApp.x = stage.stageWidth;
But all of a sudden, the filter keeps on applying itself to the target (canvas) in the angle it was originally (as if the MainApp was still at 0 degrees).
(notice in the GIF how the Blur's strong horizontal value continues to only apply horizontally although the parent object turned 90 degrees).
What would need to be changed to apply the filter to the target object before it gets it's parents transform? That way (I'm assuming) the filter's result would get transformed by the parent objects.
Any guess as to how this could be done?
https://github.com/bigp/StarlingShaderIssue
(PS: the filter I'm actually using is custom-made, but this BlurFilter example shows the same issue I'm having with the custom one. If there's any patching-up to do in the shader code, at least it wouldn't necessarily have to be done on the built-in BlurFilter specifically).
I solved this myself with numerous trial and error attempts over the course of several hours.
Since I only needed the shader to run in either at 0 or 90 degrees (not actually tweened like the gif demo shown in the question), I created a shader with two specialized sets of AGAL instructions.
Without going in too much details, the rotated version basically requires a few extra instructions to flip the x and y fields in the vertex and fragment shader (either by moving them with mov or directly calculating the mul or div result into the x or y field).
For example, compare the 0 deg vertex shader...
_vertexShader = [
"m44 op, va0, vc0", // 4x4 matrix transform to output space
"mov posOriginal, va1", // pass texture positions to fragment program
"mul posScaled, va1, viewportScale", // pass displacement positions (scaled)
].join("\n");
... with the 90 deg vertex shader:
_vertexShader = [
"m44 op, va0, vc0", // 4x4 matrix transform to output space
"mov posOriginal, va1", // pass texture positions to fragment program
//Calculate the rotated vertex "displacement" UVs
"mov temp1, va1",
"mov temp2, va1",
"mul temp2.y, temp1.x, viewportScale.y", //Flip X to Y, and scale with viewport Y
"mul temp2.x, temp1.y, viewportScale.x", //Flip Y to X, and scale with viewport X
"sub temp2.y, 1.0, temp2.y", //Invert the UV for the Y axis.
"mov posScaled, temp2",
].join("\n");
You can ignore the special aliases in the AGAL example, they're essentially posOriginal = v0, posScaled = v1 variants and viewportScale = vc4constants, then I do a string-replace to change them back to their respective registers & fields ).
Just a human-readable trick I use to avoid going insane. \☻/
The part that I struggled with the most was calculating the correct scale to adjust the UV's scale (with proper detection to Stage / Viewport resize and render-texture size shifts).
Eventually, this is what I came up with in the AS3 code:
var pt:Texture = _passTexture,
dt:RenderTexture = _displacement.texture,
notReady:Boolean = pt == null,
star:Starling = Starling.current;
var finalScaleX:Number, viewRatioX:Number = star.viewPort.width / star.stage.stageWidth;
var finalScaleY:Number, viewRatioY:Number = star.viewPort.height / star.stage.stageHeight;
if (notReady) {
finalScaleX = finalScaleY = 1.0;
} else if (isRotated) {
//NOTE: Notice how the native width is divided with height, instead of same side. Weird, but it works!
finalScaleY = pt.nativeWidth / dt.nativeHeight / _imageRatio / paramScaleX / viewRatioX; //Eureka!
finalScaleX = pt.nativeHeight / dt.nativeWidth / _imageRatio / paramScaleY / viewRatioY; //Eureka x2!
} else {
finalScaleX = pt.nativeWidth / dt.nativeWidth / _imageRatio / viewRatioX / paramScaleX;
finalScaleY = pt.nativeHeight / dt.nativeHeight / _imageRatio / viewRatioY / paramScaleY;
}
Hopefully these extracted pieces of code can be helpful to others with similar shader issues.
Good luck!

Handling Boundary Conditions in OpenCL/CUDA

Given a 3D uniform grid, I would like to set the values of the border cells relative to the values of their nearest neighbor inside the grid. E.g., given a 10x10x10 grid, for a voxel at coordinate (0, 8, 8), I'd like to set a value as follows : val(0, 8, 8)=a*val(1,8,8).
Since, a could be any real number, I do not think texture + samplers can be used in this case. In addition, the method should work on normal buffers as well.
Also, since a boundary voxel coordinate could be either part of the grid's corner, edge, or face, 26 (= 8 + 12 + 6) different choices for looking up the nearest neighbor exist (e.g. if the coordinate was at (0,0,0) its nearest neighbor insided the grid would be (1, 1, 1)). So there is a lot of potential branching.
Is there a "elegant" way to accomplish this in OpenCL/CUDA? Also, is it advisable to handle boundary using a seperate kernel?
The most usual way of handling borders in CUDA is to check for all possible border conditions and act accordingly, that is:
If "this element" is out of bounds, then return (this is very useful in CUDA, where you will probably launch more threads than strictly necessary, so the extra threads must exit early in order to avoid writing on out-of-bounds memory).
If "this element" is at/near left border (minimum x) then do special operations for left border.
Same for right, up, down (and front and back, in 3D) borders.
Fortunately, on most occasions you can use max/min to simplify these operations, so you avoid too many ifs. I like to use an expression of this form:
source_pixel_x = max(0, min(thread_2D_pos.x + j, MAX_X));
source_pixel_y = ... // you get the idea
The result of these expressions is always bound between 0 and some MAX, thus clamping the out_of_bounds source pixels to the border pixels.
EDIT: As commented by DarkZeros, it is easier (and less error prone) to use the clamp() function. Not only it checks both min and max, it also allows vector types like float3 and clamps each dimension separately. See: clamp
Here is an example I did as an exercise, a 2D gaussian blur:
__global__
void gaussian_blur(const unsigned char* const inputChannel,
unsigned char* const outputChannel,
int numRows, int numCols,
const float* const filter, const int filterWidth)
{
const int2 thread_2D_pos = make_int2( blockIdx.x * blockDim.x + threadIdx.x,
blockIdx.y * blockDim.y + threadIdx.y);
const int thread_1D_pos = thread_2D_pos.y * numCols + thread_2D_pos.x;
if (thread_2D_pos.x >= numCols || thread_2D_pos.y >= numRows)
{
return; // "this output pixel" is out-of-bounds. Do not compute
}
int j, k, jn, kn, filterIndex = 0;
float value = 0.0;
int2 pixel_2D_pos;
int pixel_1D_pos;
// Now we'll process input pixels.
// Note the use of max(0, min(thread_2D_pos.x + j, numCols-1)),
// which is a way to clamp the coordinates to the borders.
for(k = -filterWidth/2; k <= filterWidth/2; ++k)
{
pixel_2D_pos.y = max(0, min(thread_2D_pos.y + k, numRows-1));
for(j = -filterWidth/2; j <= filterWidth/2; ++j,++filterIndex)
{
pixel_2D_pos.x = max(0, min(thread_2D_pos.x + j, numCols-1));
pixel_1D_pos = pixel_2D_pos.y * numCols + pixel_2D_pos.x;
value += ((float)(inputChannel[pixel_1D_pos])) * filter[filterIndex];
}
}
outputChannel[thread_1D_pos] = (unsigned char)value;
}
In OpenCL you could use Image3d to handle your 3d grid. Boundary handling could be achived with a sampler and a specific adress mode:
CLK_ADDRESS_REPEAT - out-of-range image coordinates are wrapped to the valid range. This address mode can only be used with normalized coordinates. If normalized coordinates are not used, this addressing mode may generate image coordinates that are undefined.
CLK_ADDRESS_CLAMP_TO_EDGE - out-of-range image coordinates are clamped to the extent.
CLK_ADDRESS_CLAMP32 - out-of-range image coordinates will return a border color. The border color is (0.0f, 0.0f, 0.0f, 0.0f) if image channel order is CL_A, CL_INTENSITY, CL_RA, CL_ARGB, CL_BGRA or CL_RGBA and is (0.0f, 0.0f, 0.0f, 1.0f) if image channel order is CL_R, CL_RG, CL_RGB or CL_LUMINANCE.
CLK_ADDRESS_NONE - for this address mode the programmer guarantees that the image coordinates used to sample elements of the image refer to a location inside the image; otherwise the results are undefined.
Additionally you can define the filter mode for the interpolation (nearest neighbor or linear).
Does this fit your needs? Otherwise, please give us more detail about you data and its boundary requirements.

clarify some things about culasparse

Checking this example (API example at the end), I want to ask a few questions.
1) In the example we are supplying matrix a with non zero elements.What is the real size of the matrix though?And these are the elements of the matrix or the positions that contain non zero elements?
2) Can I use at the calculations (use in a function like culaSparseSetDcooData) a matrix A which will contain zero and non zero elements?
If I want to create a sample matrix just to test ,should I have to create a matrix with zero elements,then fill it with some elements and then?
Regarding 1) Interestingly, the size of the matrix in COO format is not explicitly specified: It consists of coordinates of the non-zero elements of the matrix. If you have a COO matrix with 1 non-zero element, then this could be
double a[1] = { 1.0 };
int colInd[1] = { 10 };
int rowInd[1] = { 20 };
and (as you can tell from the row/column indices) describe elements of a matrix that has at least size 11*21, or it could be
double a[1] = { 1.0 };
int colInd[1] = { 1000 };
int rowInd[1] = { 2000 };
and describe elements of a matrix that has at least size 1001*2001
However, in this example, it seems like this is a quadratic matrix, and n=8 seems to be the size. (Unfortunately, there seems to be no detailed documentation of the culaSparseSetDcooData function...)
Regarding 2) This is not entirely clear. If your question is whether the "non-zero" values may (in reality) have a value of 0.0, then I can say: Yes, this should be allowed. However, the example that you referred to already shows how to create a simple test matrix.

Best line equation to use for computational geometry

I'm looking to write a little comp-geom library, in Ruby.
I'm about to write the code for lines, and was wondering which line equation I should use:
ax + by + c = 0
r + tv (where r and v are vectors)
Thanks.
If using the classical equations is not a requirement, I'd suggest an array of four co-ordinates: xStart, yStart, xEnd and yEnd.
In case you need to make the line position dynamic, you could use an array of two parameters: alpha and radius. The former represents radial rotation relative to the horizontal axis and the latter is the length of line.
Yet another option would be vectors in the form of (X;Y).
Samples in C:
int endpointsLine[4] = {0, 0, 30, 40};
double radialLine[2] = {5.35589, 50};
int vectorLine[2] = {30, 40};
The "endpoints" format is fully compatible with modern line-drawing algorithms, such as Xiaolin Wu's line algorithm and Bresenham's line algorithm but it represents specific screen co-ordinates which is not the case with "radial" and "vector" format.