3D binary image to 3D mesh using itk - itk

I'm trying to generate a 3d mesh using 3d RLE binary mask.
In itk, I find a class named itkBinaryMask3DMeshSource
it's based on MarchingCubes algorithm
some example, use this class, ExtractIsoSurface et ExtractIsoSurface
in my case, I have a rle 3D binary mask but represented in 1d vector format.
I'm writing a function for this task.
My function takes as parameters :
Inputs : crle 1d vector ( computed rle), dimension Int3
Output : coord + coord indices ( or generate a single file contain both of those array; and next I can use them to visualize the mesh )
as a first step, I decoded this computed rle.
next, I use imageIterator to create an image compatible to BinaryMask3DMeshSource.
I'm blocked, in the last step.
This is my code :
void GenerateMeshFromCrle(const std::vector<int>& crle, const Int3 & dim,
std::vector<float>* coords, std::vector<int>*coord_indices, int* nodes,
int* cells, const char* outputmeshfile) {
std::vector<int> mask(crle.back());
CrleDecode(crle, mask.data());
// here we define our itk Image type with a 3 dimension
using ImageType = itk::Image< unsigned char, 3 >;
ImageType::Pointer image = ImageType::New();
// an Image is defined by start index and size for each axes
// By default, we set the first start index from x=0,y=0,z=0
ImageType::IndexType start;
start[0] = 0; // first index on X
start[1] = 0; // first index on Y
start[2] = 0; // first index on Z
// until here, no problem
// We set the image size on x,y,z from the dim input parameters
// itk takes Z Y X
ImageType::SizeType size;
size[0] = dim.z; // size along X
size[1] = dim.y; // size along Y
size[2] = dim.x; // size along Z
ImageType::RegionType region;
region.SetSize(size);
region.SetIndex(start);
image->SetRegions(region);
image->Allocate();
// Set the pixels to value from rle
// This is a fast way
itk::ImageRegionIterator<ImageType> imageIterator(image, region);
int n = 0;
while (!imageIterator.IsAtEnd() && n < mask.size()) {
// Set the current pixel to the value from rle
imageIterator.Set(mask[n]);
++imageIterator;
++n;
}
// In this step, we launch itkBinaryMask3DMeshSource
using BinaryThresholdFilterType = itk::BinaryThresholdImageFilter< ImageType, ImageType >;
BinaryThresholdFilterType::Pointer threshold =
BinaryThresholdFilterType::New();
threshold->SetInput(image->GetOutput()); // here it's an error, since no GetOutput member for image
threshold->SetLowerThreshold(0);
threshold->SetUpperThreshold(1);
threshold->SetOutsideValue(0);
using MeshType = itk::Mesh< double, 3 >;
using FilterType = itk::BinaryMask3DMeshSource< ImageType, MeshType >;
FilterType::Pointer filter = FilterType::New();
filter->SetInput(threshold->GetOutput());
filter->SetObjectValue(1);
using WriterType = itk::MeshFileWriter< MeshType >;
WriterType::Pointer writer = WriterType::New();
writer->SetFileName(outputmeshfile);
writer->SetInput(filter->GetOutput());
}
any idea
I appreciate your time.

Since image is not a filter you can plug it in directly: threshold->SetInput(image);. At the end of this function, you also need writer->Update();. The rest looks good.
Side-note: it looks like you might benefit from usage of import filter instead of manually iterating the buffer and copying values one at a time.

Related

Concurrent Writing CUDA

I am new to CUDA and I am facing a problem with a basic projection kernel. What I am trying to do is to project a 3D point cloud into a 2D image. In case multiple points project to the same pixel, only the point with the smallest depth (the closest one) should be written on the matrix.
Suppose two 3D points fall in an image pixel (0, 0), the way I am implementing the depth check here is not working if (depth > entry.depth), since the two threads (from two different blocks) execute this "in parallel". In the printf statement, in fact, both entry.depth give the numeric limit (the initialization value).
To solve this problem I thought of using a tensor-like structure, each image pixel corresponds to an array of values. After the array is reduced and only the point with the smallest depth is kept. Are there any smarter and more efficient ways of solving this problem?
__global__ void kernel_project(CUDAWorkspace* workspace_, const CUDAMatrix* matrix_) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid >= matrix_->size())
return;
const Point3& full_point = matrix_->at(tid);
float depth = 0.f;
Point2 image_point;
// full point as input, depth and image point as output
const bool& is_good = project(image_point, depth, full_point); // dst, dst, src
if (!is_good)
return;
const int irow = (int) image_point.y();
const int icol = (int) image_point.x();
if (!workspace_->inside(irow, icol)) {
return;
}
// get pointer to entry
WorkspaceEntry& entry = (*workspace_)(irow, icol);
// entry.depth is set initially to a numeric limit
if (depth > entry.depth) // PROBLEM HERE
return;
printf("entry depth %f\n", entry.depth) // BOTH PRINT THE NUMERIC LIMIT
entry.point = point;
entry.depth = depth;
}

How to interpret FFT data for making a spectrum visualizer

I am trying to visualize a spectrum where the frequency range is divided into N bars, either linearly or logarithmic. The FFT seems to work fine, but I am not sure how to interpret the values in order to decide the max height for the visualization.
I am using FMODAudio, a wrapper for C#. It's set up correctly.
In the case of a linear spectrum, the bars are defined as following:
public int InitializeSpectrum(int windowSize = 1024, int maxBars = 16)
{
numSamplesPerBar_Linear.Clear();
int barSamples = (windowSize / 2) / maxBars;
for (int i = 0; i < maxBars; ++i)
{
numSamplesPerBar_Linear.Add(barSamples);
}
IsInitialized = true;
Data = new float[numSamplesPerBar_Linear.Count];
return numSamplesPerBar_Linear.Count;
}
Data is the array which holds the spectrum values received from the update loop.
The update looks like this:
public unsafe void UpdateSpectrum(ref ParameterFFT* fftData)
{
int length = fftData->Length / 2;
if (length > 0)
{
int indexFFT = 0;
for (int index = 0; index < numSamplesPerBar_Linear.Count; ++index)
{
for (int frec = 0; frec < numSamplesPerBar_Linear[index]; ++frec)
{
for (int channel = 0; channel < fftData->ChannelCount; ++channel)
{
var floatspectrum = fftData->GetSpectrum(channel); //this is a readonlyspan<float> by default.
Data[index] += floatspectrum[indexFFT];
}
++indexFFT;
}
Data[index] /= (float)(numSamplesPerBar_Linear[index] * fftData->ChannelCount); // average of both channels for more meaningful values.
}
}
}
The values I get when testing a song are very low across the bands.
A randomly chosen moment when playing a song gives these values:
16 bars = 0,0326 0,0031 0,001 0,0003 0,0004 0,0003 0,0001 0,0002 0,0001 0,0001 0,0001 0 0 0 0 0
I realize it's more useful to use a logarithmic spectrum in many cases, and I intend to, but I still need to figure how how to find the max values for each bar so that I can setup the visualization on a proper scale.
Q: How can I know the potential max values for each bar based on this setup (it's not 1.0)?
output from FFT call is an array where each element is a complex number ( A + Bi ) where A is the real number component and B the imaginary number component ... element zero of this array represents frequency zero as in DC which is the offset bias can typically be ignored ... as you iterate across each element of this array you increment the frequency ... this freq increment is calculated using
Audio_samples <-- array of raw audio samples in PCM format which gets
fed into FFT call
num_fft_bins := float64(len(Audio_samples)) / 2.0 // using Nyquist theorem
freq_incr_per_bin := (input_audio_sample_rate / 2.0) / num_fft_bins
so to answer your question the output array from FFT call is a linear progression evenly spaced based in above freq increment constant
Depends on your input data to the FFT, and the scaling that your particular FFT implementation uses (not all FFTs use the same scale factor).
With an energy preserving forward-FFT, Parseval's theorem applies. So the energy (sum of squares) of the input vector equals the energy of the FFT result vector. Note that for a single integer periodic in aperture sinusoidal input (a pure tone), all that energy can appear in a single FFT result element. So if you know the maximum possible input energy, you can use that to compute the maximum possible result element magnitude for scaling purposes.
The range is often large enough that visualizers commonly need to use log scaling, or else typical input can get pixel quantized to a graph of all zeros.

How to calculate separable convolutions in the Xception paper?

I read the Xception paper (there's even a Keras Model for the describe NN) and it talks about separable convolutions.
I was trying to understand how exactly they are calculated. Rather than leaving it to imprecise words, I have included the piece of pseudo-code below that summarizes my understanding. The code maps from a feature map 18x18x728 to a 18x18x1024 one :
XSIZE = 18;
YSIZE = 18;
ZSIZE = 728;
ZSIXE2 = 1024;
float mapin[XSIZE][YSIZE][ZSIZE]; // Input map
float imap[XSIZE][YSIZE][ZSIZE2]; // Intermediate map
float mapout[XSIZE][YSIZE][ZSIZE2]; // Output map
float wz[ZSIZE][ZSIZE2]; // Weights for 1x1 convs
float wxy[3][3][ZSIZE2]; // Weights for 3x3 convs
// Apply 1x1 convs
for(y=0;y<YSIZE;y++)
for(x=0;x<XSIZE;x++)
for(o=0;o<ZSIZE2;o++){
s=0.0;
for(z=0;z<ZSIZE;z++)
s+=mapin[x][y][z]*wz[z][o];
imap[x][y][o]=s;
}
// Apply 2D 3x3 convs
for(o=0;o<ZSIZE2;o++)
for(y=0y<YSIZE;y++)
for(x=0;x<XSIZE;x++){
s=0.0;
for(i=-1;i<2;i++)
for(j=-1;j<2;j++)
s+=imap[x+j][y+i][o]*wxy[j+1][i+1][o]; // This value is 0 if falls off the edge
mapout[x][y][o]=s;
}
Is this correct ? If not, can you suggest fixes similarly written in C or pseudo-C ?
Thank you very much in advance.
I found tf.nn.separable_conv2d in Tensorflow that does exactly this. So I built a very simple graph and, with the help of random numbers, I tried to get the code above to match the result. The correct code is :
XSIZE = 18;
YSIZE = 18;
ZSIZE = 728;
ZSIXE2 = 1024;
float mapin[XSIZE][YSIZE][ZSIZE]; // Input map
float imap[XSIZE][YSIZE][ZSIZE]; // Intermediate map
float mapout[XSIZE][YSIZE][ZSIZE2]; // Output map
float wxy[3][3][ZSIZE]; // Weights for 3x3 convs
float wz[ZSIZE][ZSIZE2]; // Weights for 1x1 convs
// Apply 2D 3x3 convs
for(o=0;o<ZSIZE;o++)
for(y=0y<YSIZE;y++)
for(x=0;x<XSIZE;x++){
s=0.0;
for(i=-1;i<2;i++)
for(j=-1;j<2;j++)
s+=mapin[x+j][y+i][o]*wxy[j+1][i+1][o]; // This value is 0 if falls off the edge
imap[x][y][o]=s;
}
// Apply 1x1 convs
for(y=0;y<YSIZE;y++)
for(x=0;x<XSIZE;x++)
for(o=0;o<ZSIZE2;o++){
s=0.0;
for(z=0;z<ZSIZE;z++)
s+=imap[x][y][z]*wz[z][o];
mapout[x][y][o]=s;
}
The main difference is in the order that the two groups of convolutions are performed.
To my surprise, the order is important even when ZSIZE==ZSIZE2.

C - pass array as parameter and change size and content

UPDATE: I solved my problem (scroll down).
I'm writing a small C program and I want to do the following:
The program is connected to a mysql database (that works perfectly) and I want to do something with the data from the database. I get about 20-25 rows per query and I created my own struct, which should contain the information from each row of the query.
So my struct looks like this:
typedef struct {
int timestamp;
double rate;
char* market;
char* currency;
} Rate;
I want to pass an empty array to a function, the function should calculate the size for the array based on the returned number of rows of the query. E.g. there are 20 rows which are returned from a single SQL query, so the array should contain 20 objectes of my Rate struct.
I want something like this:
int main(int argc, char **argv)
{
Rate *rates = ?; // don't know how to initialize it
(void) do_something_with_rates(&rates);
// the size here should be ~20
printf("size of rates: %d", sizeof(rates)/sizeof(Rate));
}
How does the function do_something_with_rates(Rate **rates) have to look like?
EDIT: I did it as Alex said, I made my function return the size of the array as size_t and passed my array to the function as Rate **rates.
In the function you can access and change the values like (*rates)[i].timestamp = 123 for example.
In C, memory is either dynamically or statically allocated.
Something like int fifty_numbers[50] is statically allocated. The size is 50 integers no matter what, so the compiler knows how big the array is in bytes. sizeof(fifty_numbers) will give you 200 bytes here.
Dynamic allocation: int *bunch_of_numbers = malloc(sizeof(int) * varying_size). As you can see, varying_size is not constant, so the compiler can't figure out how big the array is without executing the program. sizeof(bunch_of_numbers) gives you 4 bytes on a 32 bit system, or 8 bytes on a 64 bit system. The only one that know how big the array is would be the programmer. In your case, it's whoever wrote do_something_with_rates(), but you're discarding that information by either not returning it, or taking a size parameter.
It's not clear how do_something_with_rates() was declared exactly, but something like: void do_something_with_rates(Rate **rates) won't work as the function has no idea how big rates is. I recommend something like: void do_something_with_rates(size_t array_size, Rate **rates). At any rate, going by your requirements, it's still a ways away from working. Possible solutions are below:
You need to either return the new array's size:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
return n; // returning the new size so that the caller knows
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size = do_something_with_rates(20, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Or pass in a size parameter for the function to set:
void do_something_with_rates(size_t old_array_size, size_t *new_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
*new_array_size = n; // setting the new size so that the caller knows
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size;
do_something_with_rates(20, &new_size, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Why do I need to pass the old size as a parameter?
void do_something_with_rates(Rate **rates) {
// You don't know what n is. How would you
// know how many rate objects the caller wants
// you to process for any given call to this?
for (size_t i = 0; i < n; ++i)
// carry out your operation on new_rates
}
Everything changes when you have a size parameter:
void do_something_with_rates(size_t size, Rate **rates) {
for (size_t i = 0; i < size; ++i) // Now you know when to stop
// carry out your operation on new_rates
}
This is a very fundamental flaw with your program.
I want to also want the function to change the contents of the array:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out some operation on new_rates
Rate *array = *new_rates;
for (size_t i = 0; i < n; ++i) {
array[i]->timestamp = time();
// you can see the pattern
}
return n; // returning the new size so that the caller knows
}
sizeof produces a value (or code to produce a value) of the size of a type or the type of an expression at compile time. The size of an expression can therefore not change during the execution of the program. If you want that feature, use a variable, terminal value or a different programming language. Your choice. Whatever. C's better than Java.
char foo[42];
foo has either static storage duration (which is only partially related to the static keyword) or automatic storage duration.
Objects with static storage duration exist from the start of the program to the termination. Those global variables are technically called variables declared at file scope that have static storage duration and internal linkage.
Objects with automatic storage duration exist from the beginning of their initialisation to the return of the function. These are usually on the stack, though they could just as easily be on the graph. They're variables declared at block scope that have automatic storage duration and internal linkage.
In either case, todays compilers will encode 42 into the machine code. I suppose it'd be possible to modify the machine code, though that several thousands of lines you put into that task would be much better invested into storing the size externally (see other answer/s), and this isn't really a C question. If you really want to look into this, the only examples I can think of that change their own machine code are viruses... How are you going to avoid that antivirus heuristic?
Another option is to encode size information into a struct, use a flexible array member and then you can carry both the array and the size around as one allocation. Sorry, this is as close as you'll get to what you want. e.g.
struct T_vector {
size_t size;
T value[];
};
struct T_vector *T_make(struct T_vector **v) {
size_t index = *v ? (*v)->size++ : 0, size = index + 1;
if ((index & size) == 0) {
void *temp = realloc(*v, size * sizeof *(*v)->value);
if (!temp) {
return NULL;
}
*v = temp;
// (*v)->size = size;
*v = 42; // keep reading for a free cookie
}
return (*v)->value + index;
}
#define T_size(v) ((v) == NULL ? 0 : (v)->size)
int main(void) {
struct T_vector *v = NULL; T_size(v) == 0;
{ T *x = T_make(&v); x->value[0]; T_size(v) == 1;
x->y = y->x; }
{ T *y = T_make(&v); x->value[1]; T_size(v) == 2;
y->x = x->y; }
free(v);
}
Disclaimer: I only wrote this as an example; I don't intend to test or maintain it unless the intent of the example suffers drastically. If you want something I've thoroughly tested, use my push_back.
This may seem innocent, yet even with that disclaimer and this upcoming warning I'll likely see a comment along the lines of: Each successive call to make_T may render previously returned pointers invalid... True, and I can't think of much more I could do about that. I would advise calling make_T, modifying the value pointed at by the return value and discarding that pointer, as I've done above (rather explicitly).
Some compilers might even allow you to #define sizeof(x) T_size(x)... I'm joking; don't do this. Do it, mate; it's awesome!
Technically we aren't changing the size of an array here; we're allocating ahead of time and where necessary, reallocating and copying to a larger array. It might seem appealing to abstract allocation away this way in C at times... enjoy :)

Handling Boundary Conditions in OpenCL/CUDA

Given a 3D uniform grid, I would like to set the values of the border cells relative to the values of their nearest neighbor inside the grid. E.g., given a 10x10x10 grid, for a voxel at coordinate (0, 8, 8), I'd like to set a value as follows : val(0, 8, 8)=a*val(1,8,8).
Since, a could be any real number, I do not think texture + samplers can be used in this case. In addition, the method should work on normal buffers as well.
Also, since a boundary voxel coordinate could be either part of the grid's corner, edge, or face, 26 (= 8 + 12 + 6) different choices for looking up the nearest neighbor exist (e.g. if the coordinate was at (0,0,0) its nearest neighbor insided the grid would be (1, 1, 1)). So there is a lot of potential branching.
Is there a "elegant" way to accomplish this in OpenCL/CUDA? Also, is it advisable to handle boundary using a seperate kernel?
The most usual way of handling borders in CUDA is to check for all possible border conditions and act accordingly, that is:
If "this element" is out of bounds, then return (this is very useful in CUDA, where you will probably launch more threads than strictly necessary, so the extra threads must exit early in order to avoid writing on out-of-bounds memory).
If "this element" is at/near left border (minimum x) then do special operations for left border.
Same for right, up, down (and front and back, in 3D) borders.
Fortunately, on most occasions you can use max/min to simplify these operations, so you avoid too many ifs. I like to use an expression of this form:
source_pixel_x = max(0, min(thread_2D_pos.x + j, MAX_X));
source_pixel_y = ... // you get the idea
The result of these expressions is always bound between 0 and some MAX, thus clamping the out_of_bounds source pixels to the border pixels.
EDIT: As commented by DarkZeros, it is easier (and less error prone) to use the clamp() function. Not only it checks both min and max, it also allows vector types like float3 and clamps each dimension separately. See: clamp
Here is an example I did as an exercise, a 2D gaussian blur:
__global__
void gaussian_blur(const unsigned char* const inputChannel,
unsigned char* const outputChannel,
int numRows, int numCols,
const float* const filter, const int filterWidth)
{
const int2 thread_2D_pos = make_int2( blockIdx.x * blockDim.x + threadIdx.x,
blockIdx.y * blockDim.y + threadIdx.y);
const int thread_1D_pos = thread_2D_pos.y * numCols + thread_2D_pos.x;
if (thread_2D_pos.x >= numCols || thread_2D_pos.y >= numRows)
{
return; // "this output pixel" is out-of-bounds. Do not compute
}
int j, k, jn, kn, filterIndex = 0;
float value = 0.0;
int2 pixel_2D_pos;
int pixel_1D_pos;
// Now we'll process input pixels.
// Note the use of max(0, min(thread_2D_pos.x + j, numCols-1)),
// which is a way to clamp the coordinates to the borders.
for(k = -filterWidth/2; k <= filterWidth/2; ++k)
{
pixel_2D_pos.y = max(0, min(thread_2D_pos.y + k, numRows-1));
for(j = -filterWidth/2; j <= filterWidth/2; ++j,++filterIndex)
{
pixel_2D_pos.x = max(0, min(thread_2D_pos.x + j, numCols-1));
pixel_1D_pos = pixel_2D_pos.y * numCols + pixel_2D_pos.x;
value += ((float)(inputChannel[pixel_1D_pos])) * filter[filterIndex];
}
}
outputChannel[thread_1D_pos] = (unsigned char)value;
}
In OpenCL you could use Image3d to handle your 3d grid. Boundary handling could be achived with a sampler and a specific adress mode:
CLK_ADDRESS_REPEAT - out-of-range image coordinates are wrapped to the valid range. This address mode can only be used with normalized coordinates. If normalized coordinates are not used, this addressing mode may generate image coordinates that are undefined.
CLK_ADDRESS_CLAMP_TO_EDGE - out-of-range image coordinates are clamped to the extent.
CLK_ADDRESS_CLAMP32 - out-of-range image coordinates will return a border color. The border color is (0.0f, 0.0f, 0.0f, 0.0f) if image channel order is CL_A, CL_INTENSITY, CL_RA, CL_ARGB, CL_BGRA or CL_RGBA and is (0.0f, 0.0f, 0.0f, 1.0f) if image channel order is CL_R, CL_RG, CL_RGB or CL_LUMINANCE.
CLK_ADDRESS_NONE - for this address mode the programmer guarantees that the image coordinates used to sample elements of the image refer to a location inside the image; otherwise the results are undefined.
Additionally you can define the filter mode for the interpolation (nearest neighbor or linear).
Does this fit your needs? Otherwise, please give us more detail about you data and its boundary requirements.