I got a task to do. I need to run a Flood Fill algorithm on CUDA. On CPU I have a non-recursive method with queue, but I dont have any idea how to do move this code to GPU so that it would run faster. Can anybody help?
edit:
this is my CPU code, just normal FloodFill with my little modifications
void cpuFloodFill(std::vector<std::vector<int>> *colorVector, int node)
{
std::queue<int> q;
q.push(node);
int i,j;
while(!q.empty())
{
int k = q.front();
q.pop();
k2ij(k, &i, &j);
if((*colorVector)[i][j] == COLOR_TARGET)
{
(*colorVector)[i][j] = COLOR_REPLACEMENT;
if(i - 1 >= 0 && i - 1 < X && j >= 0 && j < Y)
q.push(ij2k(i - 1, j));
if(i + 1 >= 0 && i + 1 < X && j >= 0 && j < Y)
q.push(ij2k(i + 1, j));
if(i >= 0 && i < X && j - 1 >= 0 && j - 1 < Y)
q.push(ij2k(i, j - 1));
if(i >= 0 && i < X && j + 1 >= 0 && j + 1 < Y)
q.push(ij2k(i, j + 1));
}
}
}
There's a GPU flood fill implementation in an image skeletonization toolkit named CUDA Skel. The link to its source code is on the website. Please note the license of the code: the source and toolkit are free for research purposes with due citation.
Related
As the titles says , i'm trying to pass a struct containing 4 matrices to a Cuda Kernel. The problem is that i get no errors, but the program crashes goes nuts whenever i try to execute it.All of the values returned are 0 and the clock value overflows.
Here's what i've made so far :
#define ROWS 700
#define COLS 1244
struct sobel {
int Gradient[ROWS][COLS];
int Image_input[ROWS][COLS];
int G_x[ROWS][COLS];
int G_y[ROWS][COLS];
};
__global__ void sobel(struct sobel* data)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int XLENGTH = ROWS;
int YLENGTH = COLS;
if ((x < XLENGTH) && (y < YLENGTH))
{
if (x == 0 || x == XLENGTH - 1 || y == 0 || y == YLENGTH - 1)
{
data->G_x[x][y] = data->G_y[x][y] = data->Gradient[x][y] = 0;
}
else
{
data->G_x[x][y] = data->Image_input[x + 1][y - 1]
+ 2 * data->Image_input[x + 1][y]
+ data->Image_input[x + 1][y + 1]
- data->Image_input[x - 1][y - 1]
- 2 * data->Image_input[x - 1][y]
- data->Image_input[x - 1][y + 1];
data->G_y[x][y] = data->Image_input[x - 1][y + 1]
+ 2 * data->Image_input[x][y + 1]
+ data->Image_input[x + 1][y + 1]
- data->Image_input[x - 1][y - 1]
- 2 * data->Image_input[x][y - 1]
- data->Image_input[x + 1][y - 1];
data->Gradient[x][y] = abs(data->G_x[x][y]) + abs(data->G_y[x][y]);
if (data->Gradient[x][y] > 255) {
data->Gradient[x][y] = 255;
}
}
}
}
int main() {
struct sobel* data = (struct sobel*)calloc(sizeof(*data), 1);
struct sobel* dev_data;
cudaMalloc((void**)&dev_data, sizeof(*data));
cudaMemcpy(dev_data, data, sizeof(data), cudaMemcpyHostToDevice);
dim3 blocksize(16, 16);
dim3 gridsize;
gridsize.x = (ROWS + blocksize.x - 1) / blocksize.x;
gridsize.y = (COLS + blocksize.y - 1) / blocksize.y;
sobel <<< gridsize, blocksize >>> (dev_data);
cudaMemcpy(data, dev_data, sizeof(data), cudaMemcpyDeviceToHost);
free(data);
cudaFree(dev_data);
return 0;
}
Do i also have to allocate device memory for each obe of the matrices ?
Any advice would be appreciated.
Edit : I switched a couple of things here but the program seems to ignore the nested else statement and all the values returned are 0 .
There (at least) are 2 errors in your code.
You have not allocated a correct size for the device struct:
cudaMalloc((void**)&dev_data, sizeof(data));
^
just like you did in your calloc call, that should be sizeof(*data) not sizeof(data) (Both cudaMemcpy calls should probably be updated to reflect this size as well.)
You need a proper thread check in your kernel code, something like this:
if (( x < XLENGTH ) && ( y < YLENGTH )){ // add this line
if (x == 0 || x == XLENGTH - 1 || y == 0 || y == YLENGTH - 1)
{
data->G_x[x][y] = data->G_y[x][y] = data->Gradient[x][y] = 0;
Without that, your next if test line may allow out-of-bounds threads to participate in the zeroing operation. For example any thread where x == 0 will pass that if-test. But that thread may have an out-of-bounds y-value.
I have a question regarding partial match of two strings.
I have a string and I need to validate it. To be more specific, I have an output from OCR reading and it contains some mistakes, of course. I need to check if the string is really there but as it can be written incorrectly I need only 70% match.
Is it possible to do that in UiPath? The string is in notepad (.txt) so any idead would be helpful.
Try passing OCR output/words_detected against a base word.(double fuzzyness is 0-1)
list<string> Search(string word, list<string> wordList, double fuzzyness) {
list<string> foundWords;
for (string s : wordList) {
int levenshteinDistance = LevenshteinDistance(word, s);
int length = max(word.length(), s.length());
double score = 1.0 - (double)levenshteinDistance / length;
if (score > fuzzyness) foundWords.push_back(s);
}
if (foundWords.size() > 1) {
for (double d = fuzzyness; ; d++) {
foundWords = Search(word, wordList, d);
if (foundWords.size() == 1) break;
}
}
return foundWords;}
int LevenshteinDistance(string src, string dest) {
std::vector<vector<int>> d;
d.resize((int)src.size() + 1, std::vector<int>((int)dest.size() + 1, 0));
int i, j, cost;
std::vector<char> str1(src.begin(), src.end());
std::vector<char> str2(dest.begin(), dest.end());
for (i = 0; i <= str1.size(); i++) d[i][0] = i;
for (j = 0; j <= str2.size(); j++) d[0][j] = j;
for (i = 1; i <= str1.size(); i++) {
for (j = 1; j <= str2.size(); j++) {
if (str1[i - 1] == str2[j - 1]) cost = 0;
else cost = 1;
d[i][j] = min(d[i - 1][j] + 1, min(d[i][j - 1] + 1, d[i - 1][j - 1] + cost));
if ((i > 1) && (j > 1) && (str1[i - 1] == str2[j - 2]) && (str1[i - 2] == str2[j - 1])) d[i][j] = min(d[i][j], d[i - 2][j - 2] + cost);
}
}
return d[str1.size()][str2.size()];}
I just tried to prove a sort function in frama-c. However, when I proved the outer loop.
loop invariant 0 <= i <l;
loop invariant 0 < i < l ==> \forall int a,b; 0<=b <=l-i-1 <=a < l ==>
t[a]>=t[b];
There is always with the orange bullets. I refer to many examples and I cannot find the reason. Is there someone that can help me? Thanks!!
The following is my source code:
/*# predicate Swap{L1,L2}(int *a, integer l, integer i, integer j) =
\at(a[i],L1) == \at(a[j],L2) &&
\at(a[j],L1) == \at(a[i],L2) &&
\forall integer k; k != i && k != j
==> \at(a[k],L1) == \at(a[k],L2);
*/
/*# predicate Sorted{L}(int *a, integer l, integer h) =
\forall integer i,j; l <= i <= j < h ==> a[i] <= a[j] ;
*/
/*# requires \valid(t + (0..l-1));
requires 0 <= i < l;
requires 0 <= j < l;
assigns t[i],t[j];
ensures Swap{Old,Here}(t,l,i,j);
*/
void swap(int *t, int l, int i,int j){
int tmp;
tmp = t[i];
t[i] = t[j];
t[j] = tmp;
return;
}
/*# requires l >0;
requires \valid(t + (0..l-1));
ensures (\forall integer a; 0<=a <l
==> (\exists integer b; 0<= b < l
==> \at(t[b],Old)== \at(t[a],Here) ));
ensures Sorted{Here}(t, 0, l-1);
*/
void sort(int *t, int l) {
int i;
int j;
i=j=0;
/*# loop invariant 0 <= i <l;
loop invariant 0 < i < l ==> \forall int a,b; 0<=b <=l-i-1 <=a < l ==>
t[a]>=t[b];
*/
for (i=0;i<l;i++) {
/*#
loop invariant 0<= j < l;
loop invariant 0 < j < l ==>\forall int a; 0<= a <= j ==> t[a]<=t[j];
*/
for (j=0;j<l-1;j++) {
if (t[j] > t[j+1]){
swap(t,l ,j, j+1);}
}
}
}
and I use
frama-c-gui -wp sort.c
I was given in university the following code to explain shortly what it does and what is the value of x at the end of the run as a function of n, hope someone could help me.
x = 0;
for(int i = n; i > 1; i--) {
for(int j = 1; j < i; j--) {
x +=5;
}
}
Thanks
(I assume you meant to write "j++" instead of "j--", and not end up in an infinite loop?)
If so, just execute it by hand.
The outer loop iterates with i over the integers, from n down to 2 (inclusive).
At each iteration of that loop, the inner loop iterates with j over the integers from 1 up to i - 1 (inclusive).
thus, x is incremented by 5 for each of:
j = 1, 2, ... n - 1
then, each of:
j = 1, 2, ... n - 2
then, etc,
...
until,
j = 1
if I'm not mistaken, that's n * (n - 1) / 2 iterations in total
(cf. the arithmetic progression)
to give eventually,
x == 5 * n * (n - 1) / 2
E.g., for n = 3:
x == 15
'HTH
for(int i = n; i > 1; i--) {
for(int j = 1; j < i; j--) {
since i > 1 and j=1; j < i; j--.
j will always be less than i so it becomes an infinite loop.
A && B || C && D
(A && B) || (C && D)
Are both boolean logic equal in C++? I am confused.
Whether or not they're equal depends entirely on how you define your operator precedence. If && takes precedence over ||, then yes. Otherwise, no.
In the most programming languages you'll find that operator && is of higher priority than ||.
So for example in Java, C#, C, C++, Python, Ruby, etc.
A && B || C && D
is equivalent to
(A && B) || (C && D)
You can even copy-paste the code:
#include <iostream>
using namespace std;
int main() {
bool A = false;
bool B = false;
bool C = true;
bool D = true;
for(int i = 0; i < 2; ++i) {
A = (i == 0);
for(int j = 0; j < 2; ++j) {
B = (j == 0);
for(int k = 0; k < 2; ++k) {
C = (k == 0);
for(int l = 0; l < 2; ++l) {
D = (l == 0);
cout << A << " " << B << " " << C << " " << D << " -> ";
cout << ((A && B || C && D) == ((A && B) || (C && D))) << endl;
}
}
}
}
return 0;
}
to Ideone to find out for yourself. In C++ for example the output is:
1 1 1 1 -> 1
1 1 1 0 -> 1
1 1 0 1 -> 1
1 1 0 0 -> 1
1 0 1 1 -> 1
1 0 1 0 -> 1
1 0 0 1 -> 1
1 0 0 0 -> 1
0 1 1 1 -> 1
0 1 1 0 -> 1
0 1 0 1 -> 1
0 1 0 0 -> 1
0 0 1 1 -> 1
0 0 1 0 -> 1
0 0 0 1 -> 1
0 0 0 0 -> 1
So the ((A && B || C && D) == ((A && B) || (C && D))) is a tautology.
While the final answer goes to the specifics of the C++ language you're asking about, here's some food for thought on why (and possibly how) to remember:
Conjunction (AND, &&) is often associated with multiplication, while disjunction (OR, ||) is often associated with addition (and we generally know the precedence of multiplication over addition).
Here's a quote from http://www.ocf.berkeley.edu/~fricke/projects/quinto/dnf.html:
... As a practical matter, we usually associate conjunction with
multiplication and disjunction with addition. Indeed, if we identify
true with 1 and false with 0, then {0,1} coupled with the usual
definitions of addition and multiplication over the Galois field of
size 2 (eg, arithmetic modulo 2), then addition (+) and disjunction
(or) really are the same, as are multiplication and conjunction (and).
...
Speaking in rather general terms, the computer languages tend to honor the precedence of multiplicative operators over additive operators.
(Further, these associations, e.g. between operators in logic and in algebra reoccur in other areas, such as type systems. For an interesting exposition of that, see http://blog.lab49.com/archives/3011 on the notion of Algebraic Type Systems.)