how do Sprite make trim programmatically in cocos2dx? - cocos2d-x

The image can be with transparent pixels.
How do I get the minimum size of a rectangle that fits this image without transparent pixels?
I want to get something like this:
cocos2d::Rect getBoundingBoxTrim(cocos2d::Sprite* sprite);
Perhaps you will not give me an answer, but at least some tips. I will be very grateful to you.

This is just a basic example. You will have to customize it.
Image* img = new Image();
img->initWithImageFile("my file.png");
Then you iterate over each pixel in the image and determine which pixels at the boundary of the image are opaque, and use that to compute a rectangle.
int leftMost;
int rightMost;
int upperMost;
int lowerMost;
unsigned char *data = new unsigned char[img->getDataLen()*x];
data = img->getData();
for(int i=0;i<img->getWidth();i++)
{
for(int j=0;j<img->getHeight();j++)
{
unsigned char *pixel = data + (i + j * img->getWidth()) * x;
// You can see/change pixels' RGBA value(0-255) here !
// unsigned char r = *pixel;
// unsigned char g = *(pixel + 1);
// unsigned char b = *(pixel + 2) ;
unsigned char a = *(pixel + 3);
if (a>0){
// this pixel is opaque. You can use a different threshold for transparency besides 0, depending on how transparent you won't to consider as "transparent".
//Complete yourself:
if(i<leftMost){
leftMost = i;
}
if(i> rightMost){
rightMost = i;
}
if(j> upperMost){
upperMost = j;
}
if(j< lowerMost){
lowerMost = j;
}
}
}
}
In the "Complete yourself" section, you just keep track of the boundary of where the first / last opaque pixel is.
You need four variables, one for each side.
At the end of the loop, you simply create a rectangle using the four points as sides of the rectangle!
If this solves your problem, mark it as correct.

Related

Concurrent Writing CUDA

I am new to CUDA and I am facing a problem with a basic projection kernel. What I am trying to do is to project a 3D point cloud into a 2D image. In case multiple points project to the same pixel, only the point with the smallest depth (the closest one) should be written on the matrix.
Suppose two 3D points fall in an image pixel (0, 0), the way I am implementing the depth check here is not working if (depth > entry.depth), since the two threads (from two different blocks) execute this "in parallel". In the printf statement, in fact, both entry.depth give the numeric limit (the initialization value).
To solve this problem I thought of using a tensor-like structure, each image pixel corresponds to an array of values. After the array is reduced and only the point with the smallest depth is kept. Are there any smarter and more efficient ways of solving this problem?
__global__ void kernel_project(CUDAWorkspace* workspace_, const CUDAMatrix* matrix_) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid >= matrix_->size())
return;
const Point3& full_point = matrix_->at(tid);
float depth = 0.f;
Point2 image_point;
// full point as input, depth and image point as output
const bool& is_good = project(image_point, depth, full_point); // dst, dst, src
if (!is_good)
return;
const int irow = (int) image_point.y();
const int icol = (int) image_point.x();
if (!workspace_->inside(irow, icol)) {
return;
}
// get pointer to entry
WorkspaceEntry& entry = (*workspace_)(irow, icol);
// entry.depth is set initially to a numeric limit
if (depth > entry.depth) // PROBLEM HERE
return;
printf("entry depth %f\n", entry.depth) // BOTH PRINT THE NUMERIC LIMIT
entry.point = point;
entry.depth = depth;
}

How does a computer resize an image?

Image resizing is nearly universal in any GUI framework. In fact, one of the first things you learn when starting out in web development is how to scale images using CSS or HTML's img attributes. But how does this work?
When I tell the computer to scale a 500x500 img to 100x50, or the reverse, how does the computer know which pixels to draw from the original image? Lastly, is it reasonably easy for me to write my own "image transformer" in another programming language without significant drops in performance?
Based on a bit of research, I can conclude that most web browser will use nearest neighbor or linear interpolation for image resizing. I've written a concept nearest neighbor algorithm that successfully resizes images, albeit VERY SLOWLY.
using System;
using System.Drawing;
using System.Timers;
namespace Image_Resize
{
class ImageResizer
{
public static Image Resize(Image baseImage, int newHeight, int newWidth)
{
var baseBitmap = new Bitmap(baseImage);
int baseHeight = baseBitmap.Height;
int baseWidth = baseBitmap.Width;
//Nearest neighbor interpolation converts pixels in the resized image to pixels closest to the old image. We have a 2x2 image, and want to make it a 9x9.
//Step 1. Take a 9x9 image and shrink it back to old value. To do this, divide the new width by old width (i.e. 9/2 = 4.5)
float widthRatio = (float)baseWidth/newWidth;
float heightRatio = (float)baseHeight/newHeight;
//Step 2. Perform an integer comparison for each pixel in old I believe. If we have a pixel in the new located at (4,5), then the proportional will be
//(.8888, 1.11111) which SHOULD GO DOWN to (0,1) coordinates on a 2x2. Seems counter intuitive, but imagining a 2x2 grid, (4.5) is on the left-bottom coordinate
//so it makes sense the to be on the (0,1) pixel.
var watch = new System.Diagnostics.Stopwatch();
watch.Start();
Bitmap resized = new Bitmap(newWidth, newHeight);
int oldX = 0; int oldY = 0;
for (int i = 0; i < newWidth; i++)
{
oldX = (int)(i*widthRatio);
for (int j = 0; j < newHeight; j++)
{
oldY = (int)(j*heightRatio);
Color newColor = baseBitmap.GetPixel(oldX,oldY);
resized.SetPixel(i,j, newColor);
}
}
//This works, but is 100x slower than standard library methods due to GetPixel() and SetPixel() methods. The average time to set a 1920x1080 image is a second.
watch.Stop();
Console.WriteLine("Resizing the image took " + watch.Elapsed.TotalMilliseconds + "ms.");
return resized;
}
}
class Program
{
static void Main(string[] args)
{
var img = Image.FromFile(#"C:\Users\kpsin\Pictures\codeimage.jpg");
img = ImageResizer.Resize(img, 1000, 1500);
img.Save(#"C:\Users\kpsin\Pictures\codeimage1.jpg");
}
}
}
I do hope that someone else can come along and provide either a) a faster algorithm for nearest neighbor because I'm overlooking something silly, or b) another way that image scalers work that I'm not aware of. Otherwise, question... answered?

3D binary image to 3D mesh using itk

I'm trying to generate a 3d mesh using 3d RLE binary mask.
In itk, I find a class named itkBinaryMask3DMeshSource
it's based on MarchingCubes algorithm
some example, use this class, ExtractIsoSurface et ExtractIsoSurface
in my case, I have a rle 3D binary mask but represented in 1d vector format.
I'm writing a function for this task.
My function takes as parameters :
Inputs : crle 1d vector ( computed rle), dimension Int3
Output : coord + coord indices ( or generate a single file contain both of those array; and next I can use them to visualize the mesh )
as a first step, I decoded this computed rle.
next, I use imageIterator to create an image compatible to BinaryMask3DMeshSource.
I'm blocked, in the last step.
This is my code :
void GenerateMeshFromCrle(const std::vector<int>& crle, const Int3 & dim,
std::vector<float>* coords, std::vector<int>*coord_indices, int* nodes,
int* cells, const char* outputmeshfile) {
std::vector<int> mask(crle.back());
CrleDecode(crle, mask.data());
// here we define our itk Image type with a 3 dimension
using ImageType = itk::Image< unsigned char, 3 >;
ImageType::Pointer image = ImageType::New();
// an Image is defined by start index and size for each axes
// By default, we set the first start index from x=0,y=0,z=0
ImageType::IndexType start;
start[0] = 0; // first index on X
start[1] = 0; // first index on Y
start[2] = 0; // first index on Z
// until here, no problem
// We set the image size on x,y,z from the dim input parameters
// itk takes Z Y X
ImageType::SizeType size;
size[0] = dim.z; // size along X
size[1] = dim.y; // size along Y
size[2] = dim.x; // size along Z
ImageType::RegionType region;
region.SetSize(size);
region.SetIndex(start);
image->SetRegions(region);
image->Allocate();
// Set the pixels to value from rle
// This is a fast way
itk::ImageRegionIterator<ImageType> imageIterator(image, region);
int n = 0;
while (!imageIterator.IsAtEnd() && n < mask.size()) {
// Set the current pixel to the value from rle
imageIterator.Set(mask[n]);
++imageIterator;
++n;
}
// In this step, we launch itkBinaryMask3DMeshSource
using BinaryThresholdFilterType = itk::BinaryThresholdImageFilter< ImageType, ImageType >;
BinaryThresholdFilterType::Pointer threshold =
BinaryThresholdFilterType::New();
threshold->SetInput(image->GetOutput()); // here it's an error, since no GetOutput member for image
threshold->SetLowerThreshold(0);
threshold->SetUpperThreshold(1);
threshold->SetOutsideValue(0);
using MeshType = itk::Mesh< double, 3 >;
using FilterType = itk::BinaryMask3DMeshSource< ImageType, MeshType >;
FilterType::Pointer filter = FilterType::New();
filter->SetInput(threshold->GetOutput());
filter->SetObjectValue(1);
using WriterType = itk::MeshFileWriter< MeshType >;
WriterType::Pointer writer = WriterType::New();
writer->SetFileName(outputmeshfile);
writer->SetInput(filter->GetOutput());
}
any idea
I appreciate your time.
Since image is not a filter you can plug it in directly: threshold->SetInput(image);. At the end of this function, you also need writer->Update();. The rest looks good.
Side-note: it looks like you might benefit from usage of import filter instead of manually iterating the buffer and copying values one at a time.

Handling Boundary Conditions in OpenCL/CUDA

Given a 3D uniform grid, I would like to set the values of the border cells relative to the values of their nearest neighbor inside the grid. E.g., given a 10x10x10 grid, for a voxel at coordinate (0, 8, 8), I'd like to set a value as follows : val(0, 8, 8)=a*val(1,8,8).
Since, a could be any real number, I do not think texture + samplers can be used in this case. In addition, the method should work on normal buffers as well.
Also, since a boundary voxel coordinate could be either part of the grid's corner, edge, or face, 26 (= 8 + 12 + 6) different choices for looking up the nearest neighbor exist (e.g. if the coordinate was at (0,0,0) its nearest neighbor insided the grid would be (1, 1, 1)). So there is a lot of potential branching.
Is there a "elegant" way to accomplish this in OpenCL/CUDA? Also, is it advisable to handle boundary using a seperate kernel?
The most usual way of handling borders in CUDA is to check for all possible border conditions and act accordingly, that is:
If "this element" is out of bounds, then return (this is very useful in CUDA, where you will probably launch more threads than strictly necessary, so the extra threads must exit early in order to avoid writing on out-of-bounds memory).
If "this element" is at/near left border (minimum x) then do special operations for left border.
Same for right, up, down (and front and back, in 3D) borders.
Fortunately, on most occasions you can use max/min to simplify these operations, so you avoid too many ifs. I like to use an expression of this form:
source_pixel_x = max(0, min(thread_2D_pos.x + j, MAX_X));
source_pixel_y = ... // you get the idea
The result of these expressions is always bound between 0 and some MAX, thus clamping the out_of_bounds source pixels to the border pixels.
EDIT: As commented by DarkZeros, it is easier (and less error prone) to use the clamp() function. Not only it checks both min and max, it also allows vector types like float3 and clamps each dimension separately. See: clamp
Here is an example I did as an exercise, a 2D gaussian blur:
__global__
void gaussian_blur(const unsigned char* const inputChannel,
unsigned char* const outputChannel,
int numRows, int numCols,
const float* const filter, const int filterWidth)
{
const int2 thread_2D_pos = make_int2( blockIdx.x * blockDim.x + threadIdx.x,
blockIdx.y * blockDim.y + threadIdx.y);
const int thread_1D_pos = thread_2D_pos.y * numCols + thread_2D_pos.x;
if (thread_2D_pos.x >= numCols || thread_2D_pos.y >= numRows)
{
return; // "this output pixel" is out-of-bounds. Do not compute
}
int j, k, jn, kn, filterIndex = 0;
float value = 0.0;
int2 pixel_2D_pos;
int pixel_1D_pos;
// Now we'll process input pixels.
// Note the use of max(0, min(thread_2D_pos.x + j, numCols-1)),
// which is a way to clamp the coordinates to the borders.
for(k = -filterWidth/2; k <= filterWidth/2; ++k)
{
pixel_2D_pos.y = max(0, min(thread_2D_pos.y + k, numRows-1));
for(j = -filterWidth/2; j <= filterWidth/2; ++j,++filterIndex)
{
pixel_2D_pos.x = max(0, min(thread_2D_pos.x + j, numCols-1));
pixel_1D_pos = pixel_2D_pos.y * numCols + pixel_2D_pos.x;
value += ((float)(inputChannel[pixel_1D_pos])) * filter[filterIndex];
}
}
outputChannel[thread_1D_pos] = (unsigned char)value;
}
In OpenCL you could use Image3d to handle your 3d grid. Boundary handling could be achived with a sampler and a specific adress mode:
CLK_ADDRESS_REPEAT - out-of-range image coordinates are wrapped to the valid range. This address mode can only be used with normalized coordinates. If normalized coordinates are not used, this addressing mode may generate image coordinates that are undefined.
CLK_ADDRESS_CLAMP_TO_EDGE - out-of-range image coordinates are clamped to the extent.
CLK_ADDRESS_CLAMP32 - out-of-range image coordinates will return a border color. The border color is (0.0f, 0.0f, 0.0f, 0.0f) if image channel order is CL_A, CL_INTENSITY, CL_RA, CL_ARGB, CL_BGRA or CL_RGBA and is (0.0f, 0.0f, 0.0f, 1.0f) if image channel order is CL_R, CL_RG, CL_RGB or CL_LUMINANCE.
CLK_ADDRESS_NONE - for this address mode the programmer guarantees that the image coordinates used to sample elements of the image refer to a location inside the image; otherwise the results are undefined.
Additionally you can define the filter mode for the interpolation (nearest neighbor or linear).
Does this fit your needs? Otherwise, please give us more detail about you data and its boundary requirements.

CUDA and Monte Carlo with local behavior defined

I have a question about a strange behavior In CUDA.
I am currently developing a Monte Carlo simulation on particles trajectories and I am doing the following the thing.
The position p(n) of my particle at a given date t(n) depends on the position t(n-1) of my particle at the previous date t(n-1). Indeed, let’s say the value v(n) is computed from the value p(n-1). Here is a simplified example of my code:
__device__ inline double calculateStep( double drift, double vol, double dt, double randomWalk, double S_t){
return exp((drift - vol*vol*0.5)*dt + randomWalk*vol*sqrt(dt))*S_t;
}
__device__ double doSomethingWhith(double v_n, ….) {
...
Return v_n*exp(t)*S
}
__global__ myMCsimulation( double* matrice, double * randomWalk, int nbreSimulation, int nPaths, double drift, ……) {
double dt = T/nPaths;
unsigned int tid = threadIdx.x + blockDim.x * blockIdx.x;
unsigned int stride = blockDim.x*gridDim.x;
unsigned int index = tid;
double mydt = (index - nbreSimulation)/nbreSimulation*dt + dt;
for ( index = tid; index < nbreSimulation*nPaths; index += stride) {
if (index >= nbreSimulation)
{
double v_n = DoSomethingWith(drift,dt, matrice[index – nbreSimulation]);
matrice[index] = matrice[index - nbreSimulation ] * calculateStep(drift,v_n,dt,randomWalk[index]); //
}
...}
The last code line :
matrice[index] = matrice[index - nbreSimulation ] * calculateStep(drift,v_n,dt,randomWalk[index]);
enables me to fill in only the second row of the matrix matrice. I don’t know why.
When I change the code line by :
matrice[index] = DoSomethingWith(drift,dt, matrice[index – nbreSimulation]);
My matrix is well filled in and I have all my values changed, then I am able to get back the matrice[index – nbreSimulation].
I think this is a concurrent access but I am not sure, I tried __syncthreads() but it did not work.
Could someone please help on this point?
Many thanks
I have changed my code by the following thing and now it works perfectly.
if (index < nbreSimulation) {
matrice[index] = S0;
for (workingCol=1; workingCol< nPaths; workingCol++) {
previousMove = index;
index = index + nbreSimulation;
................
matrice[index] = calculateStep(drift,vol_int[index],dt,randomWalk[index], matrice[previousMove]); }
}
}
I have tried the following thing :
I have declared a shared variable (an array of doubles) which contains the value computed at each iteration :
__shared__ double mat[];
......
for ( index = tid; index < nbreSimulation*nPaths; index += stride) {
.....
mat[index] = computedValue;
......
}
Without success. Does anyone see the issue?