Related
A game I am currently developing uses a 5x5 matrix to change the colors of the image on a per pixel basis. I was wondering if anyone has developed an extremely fast algorithm for something like this.
For every Pixel(setPixel(sourcePixel * Matrix))
I have built my own algorithm for this by getting and setting pixels on pixmap then drawing a new pixmap from this through iterating every pixel with set/get pixel. I have found a reasonably fast algorithm for this (150 million pixels ~3 seconds), but I was thinking of another idea rather than using the pixmap but I am unsure of how to implement this. Libgdx provides a FileHandle.readBytes() method that reads image files (in my case PNG) to byte arrays. My thought was rather than creating a pixmap, read the byte array while iterating the pixels. While iterating I would be drawing a new pixmap meaning their really is no point for me to make one for the base pixmap in the first place. With tests I found that with my current algorithm, 70% of the time it takes is from the method (PixMap.getPixel(x, y), and I could bypass this by straight reading the byte array. I have looked into PNG readers for byte array's online but to no avail.
Note I am unable to use ImageIO due it being an android based game. Would it make it faster by reading the byte array data while iterating/ is it possible to do this?
In the code below, JList is basically a HashMap in this context
private static JList<Integer, Pixmap> colorShiftImage(Pixmap p, JList<Integer, float[][]> cms){
JList<float[][], Pixmap> tempList = new JList<>();
for(int i = cms.size() - 1; i > -1; --i){
tempList.add(cms.getInt(i), new Pixmap(p.getWidth(), p.getHeight(), Pixmap.Format.RGBA8888));
}
for(int y = p.getHeight() - 1; y > -1; --y){
for(int x = p.getWidth() - 1; x > -1; --x){
int v = p.getPixel(x, y);
if(v != 0) {
r = ((v & 0xff000000) >>> 24);
g = ((v & 0x00ff0000) >>> 16);
b = ((v & 0x0000ff00) >>> 8);
a = ((v & 0x000000ff));
for(int i = tempList.size() - 1; i > -1; --i) {
float[][] c = tempList.getIDList().get(i);
tempList.getInt(i).drawPixel(x, y, (((l((r * c[0][0]) + (c[1][0] * g) + (c[2][0] * b) + (c[3][0] * a) + c[4][0])) << 24)
| ((l((r * c[0][1]) + (c[1][1] * g) + (c[2][1] * b) + (c[3][1] * a) + c[4][1])) << 16)
| ((l((r * c[0][2]) + (c[1][2] * g) + (c[2][2] * b) + (c[3][2] * a) + c[4][2])) << 8)
| ((l((r * c[0][3]) + (c[1][3] * g) + (c[2][3] * b) + (c[3][3] * a) + c[4][3])))));
}
}
}
}
JList<Integer, Pixmap> returnL = new JList<>();
for(int i = tempList.size() - 1; i > - 1; --i){
returnL.add(cms.getIDList().get(i), tempList.getInt(i));
}
return returnL;
}
public static int l(float v){
if(v < 0)return 0;
else if(v > 255)return 255;
return (int) v;
}
I'm trying to get the change in orientation between two deviceorientation events along the left-right axis, and top-bottom axis, those axis being usually defined as the phone x and y axis (https://developer.mozilla.org/en-US/docs/Web/Guide/Events/Orientation_and_motion_data_explained)
ie between instants t1 and t2 where those phone axis move from (x1, y1) to (x2, y2), It'd like to get (angle(x2-x1), angle(y1-y2)).
When the device is in portrait mode (in opposition to landscape mode), those axis seems to correspond to the beta and gamma. However when the phone is vertical (bottom facing the ground), the gamma value becomes extremely instable, and jumps from 90 to -90 degrees (at the same occasion, the alpha jumps by 180 degrees) You can easily see that here on your phone
I'd like to avoid that, and also get values in the 360 range. Here is what I have so far:
// assuming portrait mode
var beta0, gamma0;
window.addEventListener('deviceorientation', function(orientation) {
if (typeof beta0 === 'undefined') {
beta0 = beta;
gamma0 = gamma;
}
console.log('user has moved to the left by', gamma - gamma0, ' and to the top by', beta - beta0);
});
That works ok when the device is mostly horizontal, and not at all when it is vertical
All right. First, a simple explanation of the device orientation input:
The absolute coordinate system, (X, Y, Z) is such that X is East, Y is North and Z is up. The device relative coordinate system, (x, y, z) is such that x is right, y is top and z is up. Then the orientation angles, (alpha, beta, gamma) are the angles that describe the succession of three simple rotations that change (X, Y, Z) to (x, y, z) as so:
rotate around Z by alpha degrees, which transforms (X, Y, Z) to (X', Y', Z') with Z' = Z
rotate around X' by beta degrees, which transforms (X', Y', Z') to (X'', Y'', Z'') with X'' = X'
rotate around Y'' by gamma degrees, which transforms (X'', Y'', Z'') to (x, y, z) with y = Y''
(they are called intrinsic Tait-Bryan angles of type Z-X'-Y'')
Now we can get the corresponding rotation matrix by composing simple rotation matrix that each correspond to one of the three rotations.
[ cC 0 sC ] [ 1 0 0 ] [ cA -sA 0 ]
R(A, B, C) = Ry(C)*Rx(B)*Rz(A) = | 0 1 0 |*| 0 cB -sB |*[ sA cA 0 ]
[ -sC 0 cC ] [ 0 sB cB ] [ 0 0 1 ]
where A, B, C are short for alpha, beta, gamma and s, c for sin, cos.
Now, we are interested in the angles of the right-left (y axis) and top-down (x axis) rotations deltas between two positions (x, y, z) and (x', y', z') that correspond to the orientations (A, B, C) and (A', B', C')
The coordinates of (x', y', z') in term of (x, y, z) are given by R(A', B', C') * R(A, B, C)^-1 = R(A', B', C') * R(A, B, C)^T since the inverse is the transpose for orthogonal (rotation) matrix. Finally, if z' = p*x + q*y + r*z, the angle of those rotations are p around the right-left axis and q around the top-down one (this is true for small angles, which assume frequent orientation update, else asin(p) and asin(r) are closer from the truth)
So here is some javascript to get the rotation matrix:
/*
* gl-matrix is a nice library that handles rotation stuff efficiently
* The 3x3 matrix is a 9 element array
* such that indexes 0-2 correspond to the first column, 3-5 to the second column and 6-8 to the third
*/
import {mat3} from 'gl-matrix';
let _x, _y, _z;
let cX, cY, cZ, sX, sY, sZ;
/*
* return the rotation matrix corresponding to the orientation angles
*/
const fromOrientation = function(out, alpha, beta, gamma) {
_z = alpha;
_x = beta;
_y = gamma;
cX = Math.cos( _x );
cY = Math.cos( _y );
cZ = Math.cos( _z );
sX = Math.sin( _x );
sY = Math.sin( _y );
sZ = Math.sin( _z );
out[0] = cZ * cY + sZ * sX * sY, // row 1, col 1
out[1] = cX * sZ, // row 2, col 1
out[2] = - cZ * sY + sZ * sX * cY , // row 3, col 1
out[3] = - cY * sZ + cZ * sX * sY, // row 1, col 2
out[4] = cZ * cX, // row 2, col 2
out[5] = sZ * sY + cZ * cY * sX, // row 3, col 2
out[6] = cX * sY, // row 1, col 3
out[7] = - sX, // row 2, col 3
out[8] = cX * cY // row 3, col 3
};
and now we get the angular deltas:
const deg2rad = Math.PI / 180; // Degree-to-Radian conversion
let currentRotMat, previousRotMat, inverseMat, relativeRotationDelta,
totalRightAngularMovement=0, totalTopAngularMovement=0;
window.addEventListener('deviceorientation', ({alpha, beta, gamma}) => {
// init values if necessary
if (!previousRotMat) {
previousRotMat = mat3.create();
currentRotMat = mat3.create();
relativeRotationDelta = mat3.create();
fromOrientation(currentRotMat, alpha * deg2rad, beta * deg2rad, gamma * deg2rad);
}
// save last orientation
mat3.copy(previousRotMat, currentRotMat);
// get rotation in the previous orientation coordinate
fromOrientation(currentRotMat, alpha * deg2rad, beta * deg2rad, gamma * deg2rad);
mat3.transpose(inverseMat, previousRotMat); // for rotation matrix, inverse is transpose
mat3.multiply(relativeRotationDelta, currentRotMat, inverseMat);
// add the angular deltas to the cummulative rotation
totalRightAngularMovement += Math.asin(relativeRotationDelta[6]) / deg2rad;
totalTopAngularMovement += Math.asin(relativeRotationDelta[7]) / deg2rad;
}
Finally, to account for screen orientation, we have to replace
_z = alpha;
_x = beta;
_y = gamma;
by
const screen = window.screen;
const getScreenOrientation = () => {
const oriented = screen && (screen.orientation || screen.mozOrientation);
if (oriented) switch (oriented.type || oriented) {
case 'landscape-primary':
return 90;
case 'landscape-secondary':
return -90;
case 'portrait-secondary':
return 180;
case 'portrait-primary':
return 0;
}
return window.orientation|0; // defaults to zero if orientation is unsupported
};
const screenOrientation = getScreenOrientation();
_z = alpha;
if (screenOrientation === 90) {
_x = - gamma;
_y = beta;
}
else if (screenOrientation === -90) {
_x = gamma;
_y = - beta;
}
else if (screenOrientation === 180) {
_x = - beta;
_y = - gamma;
}
else if (screenOrientation === 0) {
_x = beta;
_y = gamma;
}
Note that the cumulative right-left and top-bottom angles will depend of the path chosen by the user, and cannot be infer directly from the device orientation but have to be tracked through the movement. You can arrive to the same position with different movements:
method 1:
keep your phone horizontal and rotate by 90 degrees clockwise. (this is neither a left-right nor a top-bottom rotation)
keep your phone in landscape mode and rotate by 90 toward you. (this is neither a 90 degrees left-right rotation)
keep your phone facing you and rotate by 90 so that it's up. (this is neither a 90 degrees left-right rotation)
method 2:
rotate the phone by 90 degrees so that it faces you and is vertical (this is a 90 degrees top-bottom rotation)
I need to XOR two BitmapData objects together.
I'm writing in Haxe, using the flash.* libraries and the AS3 compile target.
I've investigated HxSL and PixelBender, and neither one seems to have a bitwise XOR operator, nor do they have any other bitwise operators that could be used to create XOR (but am I missing something obvious? I'd accept any answer which gives a way to do a bitwise XOR using only the integer/float operators and functions available in HxSL or PixelBlender).
None of the predefined filters or shaders in Flash that I can find seem to be able to do a XOR of two images (but again, am I missing something obvious? Can XOR be done with a combination of other filters).
I can find nothing like a XOR drawmode for drawing things onto other things (but that doesn't mean it doesn't exist! That would work too, if it exists!)
The only way I can find at the moment is a pixel-by-pixel loop over the image, but this takes a couple of seconds per image even on a fast machine, as opposed to filters, which I use for my other image processing operations, which are about a hundred times faster.
Is there any faster method?
Edit:
Playing around with this a bit more I found that removing the conditional and extra Vector access in the loop speeds it up by about 100ms on my machine.
Here's the previous XOR loop:
// Original Vector XOR code:
for (var i: int = 0; i < len; i++) {
// XOR.
result[i] = vec1[i] ^ vec2[i];
if (ignoreAlpha) {
// Force alpha of FF so we can see the result.
result[i] |= 0xFF000000;
}
}
Here is the updated XOR loop for the Vector solution:
if (ignoreAlpha) {
// Force alpha of FF so we can see the result.
alphaMask = 0xFF000000;
}
// Fewer Vector accessors makes it quicker:
for (var i: int = 0; i < len; i++) {
// XOR.
result[i] = alphaMask | (vec1[i] ^ vec2[i]);
}
Answer:
Here are the solutions that I've tested to XOR two images in Flash.
I found that the PixelBender solution is about 6-10 slower than doing it in straight ActionScript.
I don't know if it's because I have a slow algorithm or it's just the limits of trying to fake bitwise operations in PixelBender.
Results:
PixelBender: ~6500ms
BitmapData.getVector(): ~480-500ms
BitmapData.getPixel32(): ~1200ms
BitmapData.getPixels(): ~1200ms
The clear winner is use BitmapData.getVector() and then XOR the two streams of pixel data.
1. PixelBender solution
This is how I implemented the bitwise XOR in PixelBender, based on the formula given on Wikipedia: http://en.wikipedia.org/wiki/Bitwise_operation#Mathematical_equivalents
Here is a Gist of the final PBK: https://gist.github.com/Coridyn/67a0ff75afaa0163f673
On my machine running an XOR on two 3200x1400 images this takes about 6500-6700ms.
I first converted the formula to JavaScript to check that it was correct:
// Do it for each RGBA channel.
// Each channel is assumed to be 8bits.
function XOR(x, y){
var result = 0;
var bitCount = 8; // log2(x) + 1
for (var n = 0; n < bitCount; n++) {
var pow2 = pow(2, n);
var x1 = mod(floor(x / pow2), 2);
var y1 = mod(floor(y / pow2), 2);
var z1 = mod(x1 + y1, 2);
result += pow2 * z1;
}
console.log('XOR(%s, %s) = %s', x, y, result);
console.log('%s ^ %s = %s', x, y, (x ^ y));
return result;
}
// Split out these functions so it's
// easier to convert to PixelBender.
function mod(x, y){
return x % y;
}
function pow(x, y){
return Math.pow(x, y);
}
function floor(x){
return Math.floor(x);
}
Confirm that it's correct:
// Test the manual XOR is correct.
XOR(255, 85); // 170
XOR(170, 85); // 255
XOR(170, 170); // 0
Then I converted the JavaScript to PixelBender by unrolling the loop using a series of macros:
// Bitwise algorithm was adapted from the "mathematical equivalents" formula on Wikipedia:
// http://en.wikipedia.org/wiki/Bitwise_operation#Mathematical_equivalents
// Macro for 2^n (it needs to be done a lot).
#define POW2(n) pow(2.0, n)
// Slight optimisation for the zeroth case - 2^0 = 1 is redundant so remove it.
#define XOR_i_0(x, y) ( mod( mod(floor(x), 2.0) + mod(floor(y), 2.0), 2.0 ) )
// Calculations for a given "iteration".
#define XOR_i(x, y, i) ( POW2(i) * ( mod( mod(floor(x / POW2(i)), 2.0) + mod(floor(y / POW2(i)), 2.0), 2.0 ) ) )
// Flash doesn't support loops.
// Unroll the loop by defining macros that call the next macro in the sequence.
// Adapted from: http://www.simppa.fi/blog/category/pixelbender/
// http://www.simppa.fi/source/LoopMacros2.pbk
#define XOR_0(x, y) XOR_i_0(x, y)
#define XOR_1(x, y) XOR_i(x, y, 1.0) + XOR_0(x, y)
#define XOR_2(x, y) XOR_i(x, y, 2.0) + XOR_1(x, y)
#define XOR_3(x, y) XOR_i(x, y, 3.0) + XOR_2(x, y)
#define XOR_4(x, y) XOR_i(x, y, 4.0) + XOR_3(x, y)
#define XOR_5(x, y) XOR_i(x, y, 5.0) + XOR_4(x, y)
#define XOR_6(x, y) XOR_i(x, y, 6.0) + XOR_5(x, y)
#define XOR_7(x, y) XOR_i(x, y, 7.0) + XOR_6(x, y)
// Entry point for XOR function.
// This will calculate the XOR the current pixels.
#define XOR(x, y) XOR_7(x, y)
// PixelBender uses floats from 0.0 to 1.0 to represent 0 to 255
// but the bitwise operations above work on ints.
// These macros convert between float and int values.
#define FLOAT_TO_INT(x) float(x) * 255.0
#define INT_TO_FLOAT(x) float(x) / 255.0
XOR for each channel of the current pixel in the evaluatePixel function:
void evaluatePixel()
{
// Acquire the pixel values from both images at the current location.
float4 frontPixel = sampleNearest(inputImage, outCoord());
float4 backPixel = sampleNearest(diffImage, outCoord());
// Set up the output variable - RGBA.
pixel4 result = pixel4(0.0, 0.0, 0.0, 1.0);
// XOR each channel.
result.r = INT_TO_FLOAT ( XOR(FLOAT_TO_INT(frontPixel.r), FLOAT_TO_INT(backPixel.r)) );
result.g = INT_TO_FLOAT ( XOR(FLOAT_TO_INT(frontPixel.g), FLOAT_TO_INT(backPixel.g)) );
result.b = INT_TO_FLOAT ( XOR(FLOAT_TO_INT(frontPixel.b), FLOAT_TO_INT(backPixel.b)) );
// Return the result for this pixel.
dst = result;
}
ActionScript Solutions
2. BitmapData.getVector()
I found the fastest solution is to extract a Vector of pixels from the two images and perform the XOR in ActionScript.
For the same two 3200x1400 this takes about 480-500ms.
package diff
{
import flash.display.Bitmap;
import flash.display.DisplayObject;
import flash.display.IBitmapDrawable;
import flash.display.BitmapData;
import flash.geom.Rectangle;
import flash.utils.ByteArray;
/**
* #author Coridyn
*/
public class BitDiff
{
/**
* Perform a binary diff between two images.
*
* Return the result as a Vector of uints (as used by BitmapData).
*
* #param image1
* #param image2
* #param ignoreAlpha
* #return
*/
public static function diffImages(image1: DisplayObject,
image2: DisplayObject,
ignoreAlpha: Boolean = true): Vector.<uint> {
// For simplicity get the smallest common width and height of the two images
// to perform the XOR.
var w: Number = Math.min(image1.width, image2.width);
var h: Number = Math.min(image1.height, image2.height);
var rect: Rectangle = new Rectangle(0, 0, w, h);
var vec1: Vector.<uint> = BitDiff.getVector(image1, rect);
var vec2: Vector.<uint> = BitDiff.getVector(image2, rect);
var resultVec: Vector.<uint> = BitDiff.diffVectors(vec1, vec2, ignoreAlpha);
return resultVec;
}
/**
* Extract a portion of an image as a Vector of uints.
*
* #param drawable
* #param rect
* #return
*/
public static function getVector(drawable: DisplayObject, rect: Rectangle): Vector.<uint> {
var data: BitmapData = BitDiff.getBitmapData(drawable);
var vec: Vector.<uint> = data.getVector(rect);
data.dispose();
return vec;
}
/**
* Perform a binary diff between two streams of pixel data.
*
* If `ignoreAlpha` is false then will not normalise the
* alpha to make sure the pixels are opaque.
*
* #param vec1
* #param vec2
* #param ignoreAlpha
* #return
*/
public static function diffVectors(vec1: Vector.<uint>,
vec2: Vector.<uint>,
ignoreAlpha: Boolean): Vector.<uint> {
var larger: Vector.<uint> = vec1;
if (vec1.length < vec2.length) {
larger = vec2;
}
var len: Number = Math.min(vec1.length, vec2.length),
result: Vector.<uint> = new Vector.<uint>(len, true);
var alphaMask = 0;
if (ignoreAlpha) {
// Force alpha of FF so we can see the result.
alphaMask = 0xFF000000;
}
// Assume same length.
for (var i: int = 0; i < len; i++) {
// XOR.
result[i] = alphaMask | (vec1[i] ^ vec2[i]);
}
if (vec1.length != vec2.length) {
// Splice the remaining items.
result = result.concat(larger.slice(len));
}
return result;
}
}
}
3. BitmapData.getPixel32()
Your current approach of looping over the BitmapData with BitmapData.getPixel32() gave a similar speed of about 1200ms:
for (var y: int = 0; y < h; y++) {
for (var x: int = 0; x < w; x++) {
sourcePixel = bd1.getPixel32(x, y);
resultPixel = sourcePixel ^ bd2.getPixel(x, y);
result.setPixel32(x, y, resultPixel);
}
}
4. BitmapData.getPixels()
My final test was to try iterating over two ByteArrays of pixel data (very similar to the Vector solution above). This implementation also took about 1200ms:
/**
* Extract a portion of an image as a Vector of uints.
*
* #param drawable
* #param rect
* #return
*/
public static function getByteArray(drawable: DisplayObject, rect: Rectangle): ByteArray {
var data: BitmapData = BitDiff.getBitmapData(drawable);
var pixels: ByteArray = data.getPixels(rect);
data.dispose();
return pixels;
}
/**
* Perform a binary diff between two streams of pixel data.
*
* If `ignoreAlpha` is false then will not normalise the
* alpha to make sure the pixels are opaque.
*
* #param ba1
* #param ba2
* #param ignoreAlpha
* #return
*/
public static function diffByteArrays(ba1: ByteArray,
ba2: ByteArray,
ignoreAlpha: Boolean): ByteArray {
// Reset position to start of array.
ba1.position = 0;
ba2.position = 0;
var larger: ByteArray = ba1;
if (ba1.bytesAvailable < ba2.bytesAvailable) {
larger = ba2;
}
var len: Number = Math.min(ba1.length / 4, ba2.length / 4),
result: ByteArray = new ByteArray();
// Assume same length.
var resultPixel:uint;
for (var i: uint = 0; i < len; i++) {
// XOR.
resultPixel = ba1.readUnsignedInt() ^ ba2.readUnsignedInt();
if (ignoreAlpha) {
// Force alpha of FF so we can see the result.
resultPixel |= 0xFF000000;
}
result.writeUnsignedInt(resultPixel);
}
// Seek back to the start.
result.position = 0;
return result;
}
There are a few possible options depending on what you want to achieve (e.g. is the XOR per channel or is it just any pixel that is non-black?).
There is the BitmapData.compare() method which can give you a lot of information about the two bitmaps. You could BitmapData.threshold() the input data before comparing.
Another option would be to use the draw method with the BlendMode.DIFFERENCE blend mode to draw your two images into the same BitmapData instance. That will show you the difference between the two images (equivalent to the Difference blending mode in Photoshop).
If you need to check if any pixel is non-black then you can try running a BitmapData.threshold first and then draw the result with the difference blend mode as above for the two images.
Are you doing this for image processing or something else like per-pixel hit detection?
To start with I'd have a look at BitmapData and see what is available to play with.
I'd like to know if there is a java way to, given a polygon, draw another one at a given distance and with the same center.
I tried AffineTransform but don't really know how it Works.
Thank you.
You need to translate your polygon by half its centroid width and height. I have included the code that comes from http://paulbourke.net/geometry/polygonmesh/PolygonUtilities.java to calculate the centroid of a polygon.
public void drawPolygon(){
Graphics2D g2 = bufferedImage.createGraphics();
Polygon poly=new Polygon();
poly.addPoint(100, 100);
poly.addPoint(200, 100);
poly.addPoint(200, 200);
poly.addPoint(150, 250);
poly.addPoint(100, 200);
poly.addPoint(100, 100);
g2.setColor(Color.blue);
g2.fillPolygon(poly);
g2.setColor(Color.red);
Point2D.Double []pts=new Point2D.Double[poly.npoints];
for (int i=0;i<poly.npoints;i++){
pts[i]=new Point2D.Double(poly.xpoints[i],poly.ypoints[i]);
}
Point2D centroid=centerOfMass(pts);
g2.translate(-centroid.getX(), -centroid.getY());
g2.scale(2, 2);
g2.drawPolygon(poly);
}
public static double area(Point2D[] polyPoints) {
int i, j, n = polyPoints.length;
double area = 0;
for (i = 0; i < n; i++) {
j = (i + 1) % n;
area += polyPoints[i].getX() * polyPoints[j].getY();
area -= polyPoints[j].getX() * polyPoints[i].getY();
}
area /= 2.0;
return (area);
}
/**
* Function to calculate the center of mass for a given polygon, according
* to the algorithm defined at
* http://local.wasp.uwa.edu.au/~pbourke/geometry/polyarea/
*
* #param polyPoints
* array of points in the polygon
* #return point that is the center of mass
*/
public static Point2D centerOfMass(Point2D[] polyPoints) {
double cx = 0, cy = 0;
double area = area(polyPoints);
// could change this to Point2D.Float if you want to use less memory
Point2D res = new Point2D.Double();
int i, j, n = polyPoints.length;
double factor = 0;
for (i = 0; i < n; i++) {
j = (i + 1) % n;
factor = (polyPoints[i].getX() * polyPoints[j].getY()
- polyPoints[j].getX() * polyPoints[i].getY());
cx += (polyPoints[i].getX() + polyPoints[j].getX()) * factor;
cy += (polyPoints[i].getY() + polyPoints[j].getY()) * factor;
}
area *= 6.0f;
factor = 1 / area;
cx *= factor;
cy *= factor;
res.setLocation(cx, cy);
return res;
}
Another way of doing this, common in the GIS world, is to buffer a polygon. There is a library called Java Topology Suite that will provide this functionality, although it might be harder to figure out what the scale factor is.
There are some very interesting discussions about polygon growing in this post: An algorithm for inflating/deflating (offsetting, buffering) polygons
I'm running CUDA 5.0, with compute_30,sm_30 set using a 670.
I create a mipmapped array via:
cudaExtent size;
size.width = window_width; // 600
size.height = window_height; // 600
size.depth = 1;
int levels = getMipMapLevels(size);
levels = MIN(levels, 9); // 9
cudaChannelFormatDesc fp32;
fp32.f = cudaChannelFormatKindFloat;
fp32.x = fp32.y = fp32.z = fp32.w = 32;
cudaMipmappedArray_t A;
checkCuda(cudaMallocMipmappedArray(&A, &fp32, size, levels, cudaArraySurfaceLoadStore));
I load the first level of A with surf2Dwrites. I know that works since I copy that array to the host and dump it to an image file. I now wish to fill the other miplevels of A with the mipmaps. One iteration through that loop looks like:
width >>= 1; width = MAX(1, width);
height >>= 1; height = MAX(1, height);
cudaArray_t from, to;
checkCuda(cudaGetMipmappedArrayLevel(&from, A, newlevel-1));
checkCuda(cudaGetMipmappedArrayLevel(&to, A, newlevel));
cudaTextureObject_t from_texture;
create_texture_object(from, true, &from_texture);
cudaSurfaceObject_t to_surface;
create_surface_object(to, &to_surface);
dim3 blocksize(16, 16, 1);
dim3 gridsize((width+blocksize.x-1)/blocksize.x,(height+blocksize.y-1)/blocksize.y, 1);
d_mipmap<<<gridsize, blocksize>>>(to_surface, from_texture, width, height);
checkCuda(cudaDeviceSynchronize());
checkCuda(cudaGetLastError());
uncreate_texture_object(&from_texture);
uncreate_surface_object(&to_surface);
The create_surface_object() code is known to work. Just in case, here's the create_texture_object() code:
static void create_texture_object(cudaArray_t tarray, bool filter_linear, cudaTextureObject_t *tobject)
{
assert(tarray && tobject);
// build the resource
cudaResourceDesc color_res;
memset(&color_res, 0, sizeof(cudaResourceDesc));
color_res.resType = cudaResourceTypeArray;
color_res.res.array.array = tarray;
// the texture descriptor
cudaTextureDesc texdesc;
memset(&texdesc, 0, sizeof(cudaTextureDesc));
texdesc.addressMode[0] = cudaAddressModeClamp;
texdesc.addressMode[1] = cudaAddressModeClamp;
texdesc.addressMode[2] = cudaAddressModeClamp;
texdesc.filterMode = filter_linear ? cudaFilterModeLinear : cudaFilterModePoint;
texdesc.normalizedCoords = 1;
checkCuda(cudaCreateTextureObject(tobject, &color_res, &texdesc, NULL));
}
The d_mipmap device function is the following:
__global__ void
d_mipmap(cudaSurfaceObject_t out, cudaTextureObject_t in, int w, int h)
{
float x = blockIdx.x * blockDim.x + threadIdx.x;
float y = blockIdx.y * blockDim.y + threadIdx.y;
float dx = 1.0/float(w);
float dy = 1.0/float(h);
if ((x < w) && (y < h))
{
#if 0
float4 color =
(tex2D<float4>(in, (x + .25f) * dx, (y + .25f) * dy)) +
(tex2D<float4>(in, (x + .75f) * dx, (y + .25f) * dy)) +
(tex2D<float4>(in, (x + .25f) * dx, (y + .75f) * dy)) +
(tex2D<float4>(in, (x + .75f) * dx, (y + .75f) * dy));
color /= 4.0f;
surf2Dwrite(color, mipOutput, x * sizeof(float4), y);
#endif
float4 color0 = tex2D<float4>(in, (x + .25f) * dx, (y + .25f) * dy);
surf2Dwrite(color0, out, x * sizeof(float4), y);
}
}
That contains both the mipmap sampling code (if'd out) plus debugging code.
The problem is, color0 is always uniformly zero, and I've been unable to understand why. I've changed the filtering to point (from linear) with no success. I've checked for errors. Nothing.
I am using CUDA/OpenGL interop here, but the mipmap generation is being done on CUDA arrays only.
I really really do not want to have to use texture references.
Any suggestions on where to look?
The bug turns out to be the use of cudaMipmappedArrays (either the array or the texture object -- I'm unable to tell which is broken.)
When I modify the code to use cudaArrays only, the texture reference starts working again.
Since the bindless texture program sample works, the bug appears to be limited to float32 channel mipmapped textures only. (I have a test program that shows the bug occurs with both 1 and 4 channel float32 mipmapped textures.)
I've reported the bug to Nvidia.