I am unable to comprehend why in the test shown below, iterator p never reaches the end and therefore the loop breaks only when k = 20? What exactly is the push_back doing to cause undefined behavior? Is it because the vector dynamically allocated a bunch of additional storage for the new elements I want to use, and the amount is not necessarily the amount I will use?
#include <iostream>
#include <vector>
#include <list>
using namespace std;
const int MAGIC = 11223344;
void test()
{
bool allValid = true;
int k = 0;
vector<int> v2(5, MAGIC);
k = 0;
for (vector<int>::iterator p = v2.begin(); p != v2.end(); p++, k++)
{
if (k >= 20) // prevent infinite loop
break;
if (*p != MAGIC)
{
cout << "Item# " << k << " is " << *p << ", not " << MAGIC <<"!" << endl;
allValid = false;
}
if (k == 2)
{
for (int i = 0; i < 5; i++)
v2.push_back(MAGIC);
}
}
if (allValid && k == 10)
cout << "Passed test 3" << endl;
else
cout << "Failed test 3" << "\n" << k << endl;
}
int main()
{
test();
}
Insertion to a vector while iterating over it is really a bad idea. Data insertion may cause memory reallocation that invalidates iterators. In this case, the capacity was not enough to insert additional elements, which caused memory allocation with a different address. You can check it yourself:
void test()
{
bool allValid = true;
int k = 0;
vector<int> v2(5, MAGIC);
k = 0;
for (vector<int>::iterator p = v2.begin(); p != v2.end(); p++, k++)
{
cout << v2.capacity() << endl; // Print the vector capacity
if (k >= 20) // prevent infinite loop
break;
if (*p != MAGIC) {
//cout << "Item# " << k << " is " << *p << ", not " << MAGIC <<"!" << endl;
allValid = false;
}
if (k == 2) {
for (int i = 0; i < 5; i++)
v2.push_back(MAGIC);
}
}
if (allValid && k == 10)
cout << "Passed test 3" << endl;
else
cout << "Failed test 3" << "\n" << k << endl;
}
This code will output something like the following:
5
5
5
10 <-- the capacity has changed
10
... skipped ...
10
10
Failed test 3
20
We can see that where k is equal to 2 (third line), the capacity of the vector doubled (fourth line) because we are adding new elements. The memory is redistributed, and the vector elements are most likely now located elsewhere. You can also check it by printing vector base address with data member function instead of capacity:
Address: 0x136dc20 k: 0
Address: 0x136dc20 k: 1
Address: 0x136dc20 k: 2
Address: 0x136e050 k: 3 <-- the address has changed
Address: 0x136e050 k: 4
... skipped ...
Address: 0x136e050 k: 19
Address: 0x136e050 k: 20
Failed test 3
20
The code is poorly written, you can make it more robust by using indices instead of iterators.
I'm making a Mandelbrot set program with CUDA. However I can't step more unless cudaErrorMissingConfiguration from cudaMallocPitch() function of CUDA is to be solved. Could you tell me something about it?
My GPU is GeForce RTX 2060 SUPER.
I'll show you my command lines below.
> nvcc MandelbrotCUDA.cu -o MandelbrotCUDA -O3
I tried cudaDeviceSetLimit( cudaLimitMallocHeapSize, 7*1024*1024*1024 ) to
resize heap size.
cudaDeviceSetLimit was success.
However I cannot step one more. I cannot print "CUDA malloc done!"
#include <iostream>
#include <thrust/complex.h>
#include <fstream>
#include <string>
#include <stdlib.h>
using namespace std;
#define D 0.0000025 // Tick
#define LIMIT_N 255
#define INF_NUM 2
#define PLOT_METHOD 2 // dat file : 0, ppm file : 1, ppm file with C : 2
__global__
void calculation(const int indexTotalX, const int indexTotalY, int ***n, thrust::complex<double> ***c){ // n, c are the pointers of dN, dC.
for(int i = 0; i < indexTotalY ; i++){
for(int j = 0; j < indexTotalX; j++){
thrust::complex<double> z(0.0f, 0.0f);
n[i][j] = 0;
for(int ctr=1; ctr <= LIMIT_N ; ctr++){
z = z*z + (*(c[i][j]));
n[i][j] = n[i][j] + (abs(z) < INF_NUM);
}
}
}
}
int main(){
// Data Path
string filePath = "Y:\\Documents\\Programming\\mandelbrot\\";
string fileName = "mandelbrot4.ppm";
string filename = filePath+fileName;
//complex<double> c[N][M];
double xRange[2] = {-0.76, -0.74};
double yRange[2] = {0.05, 0.1};
const int indexTotalX = (xRange[1]-xRange[0])/D;
const int indexTotalY = (yRange[1]-yRange[0])/D;
thrust::complex<double> **c;
//c = new complex<double> [N];
cout << "debug_n" << endl;
int **n;
n = new int* [indexTotalY];
c = new thrust::complex<double> * [indexTotalY];
for(int i=0;i<indexTotalY;i++){
n[i] = new int [indexTotalX];
c[i] = new thrust::complex<double> [indexTotalX];
}
cout << "debug_n_end" << endl;
for(int i = 0; i < indexTotalY; i++){
for(int j = 0; j < indexTotalX; j++){
thrust::complex<double> tmp( xRange[0]+j*D, yRange[0]+i*D );
c[i][j] = tmp;
//n[i*sqrt(N)+j] = 0;
}
}
// CUDA malloc
cout << "CUDA malloc initializing..." << endl;
int **dN;
thrust::complex<double> **dC;
cudaError_t error;
error = cudaDeviceSetLimit(cudaLimitMallocHeapSize, 7*1024*1024*1024);
if(error != cudaSuccess){
cout << "cudaDeviceSetLimit's ERROR CODE = " << error << endl;
return 0;
}
size_t tmpPitch;
error = cudaMallocPitch((void **)dN, &tmpPitch,(size_t)(indexTotalY*sizeof(int)), (size_t)(indexTotalX*sizeof(int)));
if(error != cudaSuccess){
cout << "CUDA ERROR CODE = " << error << endl;
cout << "indexTotalX = " << indexTotalX << endl;
cout << "indexTotalY = " << indexTotalY << endl;
return 0;
}
cout << "CUDA malloc done!" << endl;
This is console messages below.
debug_n
debug_n_end
CUDA malloc initializing...
CUDA ERROR CODE = 1
indexTotalX = 8000
indexTotalY = 20000
There are several problems here:
int **dN;
...
error = cudaMallocPitch((void **)dN, &tmpPitch,(size_t)(indexTotalY*sizeof(int)), (size_t)(indexTotalX*sizeof(int)));
The correct type of pointer to use in CUDA allocations is a single pointer:
int *dN;
not a double pointer:
int **dN;
(so your kernel where you are trying pass triple-pointers:
void calculation(const int indexTotalX, const int indexTotalY, int ***n, thrust::complex<double> ***c){ // n, c are the pointers of dN, dC.
is almost certainly not going to work, and should not be designed that way, but that is not the question you are asking.)
The pointer is passed to the allocating function by its address:
error = cudaMallocPitch((void **)&dN,
For cudaMallocPitch, only the horizontal requested dimension is scaled by the size of the data element. The allocation height is not scaled this way. Also, I will assume X corresponds to your allocation width, and Y corresponds to your allocation height, so you also have those parameters reversed:
error = cudaMallocPitch((void **)&dN, &tmpPitch,(size_t)(indexTotalX*sizeof(int)), (size_t)(indexTotalY));
The cudaLimitMallocHeapSize should not be necessary to set to make any of this work. It applies only to in-kernel allocations. Reserving 7GB on an 8GB card may also cause problems. Until you are sure you need that (it's not needed for what you have shown) I would simply remove that.
$ cat t1488.cu
#include <iostream>
#include <thrust/complex.h>
#include <fstream>
#include <string>
#include <stdlib.h>
using namespace std;
#define D 0.0000025 // Tick
#define LIMIT_N 255
#define INF_NUM 2
#define PLOT_METHOD 2 // dat file : 0, ppm file : 1, ppm file with C : 2
__global__
void calculation(const int indexTotalX, const int indexTotalY, int ***n, thrust::complex<double> ***c){ // n, c are the pointers of dN, dC.
for(int i = 0; i < indexTotalY ; i++){
for(int j = 0; j < indexTotalX; j++){
thrust::complex<double> z(0.0f, 0.0f);
n[i][j] = 0;
for(int ctr=1; ctr <= LIMIT_N ; ctr++){
z = z*z + (*(c[i][j]));
n[i][j] = n[i][j] + (abs(z) < INF_NUM);
}
}
}
}
int main(){
// Data Path
string filePath = "Y:\\Documents\\Programming\\mandelbrot\\";
string fileName = "mandelbrot4.ppm";
string filename = filePath+fileName;
//complex<double> c[N][M];
double xRange[2] = {-0.76, -0.74};
double yRange[2] = {0.05, 0.1};
const int indexTotalX = (xRange[1]-xRange[0])/D;
const int indexTotalY = (yRange[1]-yRange[0])/D;
thrust::complex<double> **c;
//c = new complex<double> [N];
cout << "debug_n" << endl;
int **n;
n = new int* [indexTotalY];
c = new thrust::complex<double> * [indexTotalY];
for(int i=0;i<indexTotalY;i++){
n[i] = new int [indexTotalX];
c[i] = new thrust::complex<double> [indexTotalX];
}
cout << "debug_n_end" << endl;
for(int i = 0; i < indexTotalY; i++){
for(int j = 0; j < indexTotalX; j++){
thrust::complex<double> tmp( xRange[0]+j*D, yRange[0]+i*D );
c[i][j] = tmp;
//n[i*sqrt(N)+j] = 0;
}
}
// CUDA malloc
cout << "CUDA malloc initializing..." << endl;
int *dN;
thrust::complex<double> **dC;
cudaError_t error;
size_t tmpPitch;
error = cudaMallocPitch((void **)&dN, &tmpPitch,(size_t)(indexTotalX*sizeof(int)), (size_t)(indexTotalY));
if(error != cudaSuccess){
cout << "CUDA ERROR CODE = " << error << endl;
cout << "indexTotalX = " << indexTotalX << endl;
cout << "indexTotalY = " << indexTotalY << endl;
return 0;
}
cout << "CUDA malloc done!" << endl;
}
$ nvcc -o t1488 t1488.cu
t1488.cu(68): warning: variable "dC" was declared but never referenced
$ cuda-memcheck ./t1488
========= CUDA-MEMCHECK
debug_n
debug_n_end
CUDA malloc initializing...
CUDA malloc done!
========= ERROR SUMMARY: 0 errors
$
main.cpp
#include "Primes.h"
#include <iostream>
int main(){
std::string choose;
int num1, num2;
while(1 == 1){
std::cout << "INSTRUCTIONS" << std::endl << "Enter:" << std::endl
<< "'c' to check whether a number is a prime," << std::endl
<< "'u' to view all the prime numbers between two numbers "
<< "that you want," << std::endl << "'x' to exit,"
<< std::endl << "Enter what you would like to do: ";
std::cin >> choose;
std::cout << std::endl;
if(choose == "c"){
std::cout << "Enter number: ";
std::cin >> num1;
Primes::checkPrimeness(num1) == 1 ?
std::cout << num1 << " is a prime." << std::endl << std::endl :
std::cout << num1 << " isn't a prime." << std::endl << std::endl;
}else if(choose == "u"){
std::cout << "Enter the number you want to start seeing primes "
<< "from: ";
std::cin >> num1;
std::cout << "\nEnter the number you want to stop seeing primes "
<< "till: ";
std::cin >> num2;
std::cout << std::endl;
for(num1; num1 <= num2; num1++){
Primes::checkPrimeness(num1) == 1 ?
std::cout << num1 << " is a prime." << std::endl :
std::cout << num1 << " isn't a prime." << std::endl;
}
}else if(choose == "x"){
return 0;
}
std::cout << std::endl;
}
}
Primes.h
#ifndef PRIMES_H
#define PRIMES_H
namespace Primes{
extern int num, count;
extern bool testPrime;
// Returns true if the number is a prime and false if it isn't.
int checkPrimeness(num);
}
#endif
Primes.cpp
#include "Primes.h"
#include <iostream>
int Primes::checkPrimeness(num){
if(num < 2){
return(0);
}else if(num == 2){
return(1);
}else{
for(count = 0; count < num; count++){
for(count = 2; count < num; count++){
if(num % count == 0){
return(0);
}else{
testPrime = true;
if(count == --num && testPrime == true){
return(1);
}
}
}
}
}
}
I get the following 3 errors: Errors from terminal
I've spent hours for days and still can't seem to fix the errors.
I've tried using extern and pretty much everything I can imagine.
Here is an error in function declaration:
int checkPrimeness(num);
defines a global integer variable checkPrimeness initialized with num! To declare a function you just should change it like:
int checkPrimeness(int);
Can't understand why you declare parameters as external variables. To split declarations and realization you should declare all functions and classes inside header file, and define them inside source file.
I have the following code to copy from a host variable to a __constant__ variable in CUDA
int main(int argc, char **argv){
int exit_code;
if (argc < 4) {
std::cout << "Usage: \n " << argv[0] << " <input> <output> <nColors>" << std::endl;
return 1;
}
Color *h_input;
int h_rows, h_cols;
timer1.Start();
exit_code = readText2RGB(argv[1], &h_input, &h_rows, &h_cols);
timer1.Stop();
std::cout << "Reading: " << timer1.Elapsed() << std::endl;
if (exit_code != SUCCESS){
std::cout << "Error trying to read file." << std::endl;
return FAILURE;
}
CpuTimer timer1;
GpuTimer timer2;
float timeStep2 = 0, timeStep3 = 0;
int h_numColors = atoi(argv[3]);
int h_change = 0;
int *h_pixelGroup = new int[h_rows*h_cols];
Color *h_groupRep = new Color[h_numColors];
Color *h_output = new Color[h_rows*h_cols];
Color *d_input;
int *d_pixelGroup;
Color *d_groupRep;
Color *d_output;
dim3 block(B_WIDTH, B_HEIGHT);
dim3 grid((h_cols+B_WIDTH-1)/B_WIDTH, (h_rows+B_HEIGHT-1)/B_HEIGHT);
checkCudaError(cudaMalloc((void**)&d_input, sizeof(Color)*h_rows*h_cols));
checkCudaError(cudaMalloc((void**)&d_pixelGroup, sizeof(int)*h_rows*h_cols));
checkCudaError(cudaMalloc((void**)&d_groupRep, sizeof(Color)*h_numColors));
checkCudaError(cudaMalloc((void**)&d_output, sizeof(Color)*h_rows*h_cols));
// STEP 1
//Evenly distribute all pixels of the image onto the color set
timer2.Start();
checkCudaError(cudaMemcpyToSymbol(c_rows, &h_rows, sizeof(int)));
checkCudaError(cudaMemcpyToSymbol(c_cols, &h_cols, sizeof(int)));
checkCudaError(cudaMemcpyToSymbol(c_numColors, &h_numColors, sizeof(int)));
checkCudaError(cudaMemcpy(d_input, h_input, sizeof(Color)*h_rows*h_cols, cudaMemcpyHostToDevice));
clut_distributePixels<<<grid, block>>>(d_pixelGroup);
checkCudaError(cudaMemcpy(h_pixelGroup, d_pixelGroup, sizeof(int)*h_rows*h_cols, cudaMemcpyDeviceToHost));
timer2.Stop();
std::cout << "Phase 1: " << timer2.Elapsed() << std::endl;
std::cout << h_pixelGroup[0] << ","
<< h_pixelGroup[3] << ","
<< h_pixelGroup[4] << ","
<< h_pixelGroup[7] << ","
<< h_pixelGroup[8] << std::endl;
//Do the STEP 2 and STEP 3 as long as there is at least one change of representative in a group
do {
// STEP 2
//Set the representative value to the average colour of all pixels in the same set
timer1.Start();
for (int ng = 0; ng < h_numColors; ng++) {
int r = 0, g = 0, b = 0;
int elem = 0;
for (int i = 0; i < h_rows; i++) {
for (int j = 0; j < h_cols; j++) {
if (h_pixelGroup[i*h_cols+j] == ng) {
r += h_input[i*h_cols+j].r;
g += h_input[i*h_cols+j].g;
b += h_input[i*h_cols+j].b;
elem++;
}
}
}
if (elem == 0) {
h_groupRep[ng].r = 255;
h_groupRep[ng].g = 255;
h_groupRep[ng].b = 255;
}else{
h_groupRep[ng].r = r/elem;
h_groupRep[ng].g = g/elem;
h_groupRep[ng].b = b/elem;
}
}
timer1.Stop();
timeStep2 += timer1.Elapsed();
// STEP 3
//For each pixel in the image, compute Euclidean's distance to each representative
//and assign it to the set which is closest
h_change = 0;
timer2.Start();
checkCudaError(cudaMemcpyToSymbol(d_change, &h_change, sizeof(int)));
checkCudaError(cudaMemcpy(d_groupRep, h_groupRep, sizeof(Color)*h_numColors, cudaMemcpyHostToDevice));
clut_checkDistances<<<grid, block>>>(d_input, d_pixelGroup, d_groupRep);
checkCudaError(cudaMemcpy(h_pixelGroup, d_pixelGroup, sizeof(int)*h_rows*h_cols, cudaMemcpyDeviceToHost));
checkCudaError(cudaMemcpyFromSymbol(&h_change, d_change, sizeof(int)));
timer2.Stop();
timeStep3 += timer2.Elapsed();
std::cout << "Chunche" << std::endl;
} while (h_change == 1);
std::cout << "Phase 2: " << timeStep2 << std::endl;
std::cout << "Phase 3: " << timeStep3 << std::endl;
// STEP 4
//Create the new image with the resulting color lookup table
timer2.Start();
clut_createImage<<<grid, block>>>(d_output, d_pixelGroup, d_groupRep);
checkCudaError(cudaMemcpy(h_output, d_output, sizeof(Color)*h_rows*h_cols, cudaMemcpyDeviceToHost));
timer2.Stop();
std::cout << "Phase 4: " << timer2.Elapsed() << std::endl;
checkCudaError(cudaFree(d_input));
checkCudaError(cudaFree(d_pixelGroup));
checkCudaError(cudaFree(d_groupRep));
checkCudaError(cudaFree(d_output));
timer1.Start();
exit_code = writeRGB2Text(argv[2], h_input, h_rows, h_cols);
timer1.Stop();
std::cout << "Writing: " << timer1.Elapsed() << std::endl;
delete[] h_pixelGroup;
delete[] h_groupRep;
delete[] h_output;
return SUCCESS;
}
when I print from within the kernel I get zeros for the three values
__global__
void clut_distributePixels(int *pixelGroup){
int i = blockDim.y * blockIdx.y + threadIdx.y;
int j = blockDim.x * blockIdx.x + threadIdx.x;
if(i == 0 && j == 0){
printf("a: %d\n", c_rows);
printf("b: %d\n", c_cols);
printf("c: %d\n", c_numColors);
}
while (i < c_rows) {
while (j < c_cols) {
pixelGroup[i*c_cols+j] = (i*c_cols+j)/c_numColors;
j += gridDim.x * blockDim.x;
}
j = blockDim.x * blockIdx.x + threadIdx.x;
i += gridDim.y * blockDim.y;
}
}
Either I am not copying correctly to constant memory or ... I don't know what could be wrong. Any advise !?
I posted the entire host code probably something else is messing with the constant copies.
UPDATE
Main.cu
#include "Imageproc.cuh"
int main(){
int h_change = 0;
int h_rows = 512;
cudaMemcpyToSymbol(c_rows, &h_rows, sizeof(int));
chunche<<<1,1>>>();
cudaMemcpyFromSymbol(&h_change, d_change, sizeof(int));
std::cout << "H = " << h_change << std::endl;
return 0
}
Imageproc.cuh
#ifndef _IMAGEPROC_CUH_
#define _IMAGEPROC_CUH_
#include "Utilities.cuh"
#define B_WIDTH 16
#define B_HEIGHT 16
__constant__ int c_rows;
__constant__ int c_cols;
__constant__ int c_numColors;
__device__ int d_change;
#ifdef __cplusplus
extern "C"
{
#endif
__global__
void chunche();
__global__
void clut_distributePixels(int *pixelGroup);
__global__
void clut_checkDistances(Color *input, int *pixelGroup, Color *groupRep);
__global__
void clut_createImage(Color *clutImage, int *pixelGroup, Color *groupRep);
#ifdef __cplusplus
}
#endif
#endif
Imageproc.cu
#include "Imageproc.cuh"
__global__
void chunche(){
d_change = c_rows + 1;
}
__global__
void clut_distributePixels(int *pixelGroup){
int i = blockDim.y * blockIdx.y + threadIdx.y;
int j = blockDim.x * blockIdx.x + threadIdx.x;
while (i < c_rows) {
while (j < c_cols) {
pixelGroup[i*c_cols+j] = (i*c_cols+j)/c_numColors;
j += gridDim.x * blockDim.x;
}
j = blockDim.x * blockIdx.x + threadIdx.x;
i += gridDim.y * blockDim.y;
}
}
__global__
void clut_checkDistances(Color *input, int *pixelGroup, Color *groupRep){
int i = blockDim.y * blockIdx.y + threadIdx.y;
int j = blockDim.x * blockIdx.x + threadIdx.x;
int newGroup;
while (i < c_rows) {
while (j < c_cols) {
newGroup = 0;
for (int ng = 1; ng < c_numColors; ng++) {
if (
/*If distance from color to group ng is less than distance from color to group idx
then color should belong to ng*/
(groupRep[ng].r-input[i*c_cols+j].r)*(groupRep[ng].r-input[i*c_cols+j].r) +
(groupRep[ng].g-input[i*c_cols+j].g)*(groupRep[ng].g-input[i*c_cols+j].g) +
(groupRep[ng].b-input[i*c_cols+j].b)*(groupRep[ng].b-input[i*c_cols+j].b)
<
(groupRep[newGroup].r-input[i*c_cols+j].r)*(groupRep[newGroup].r-input[i*c_cols+j].r)+
(groupRep[newGroup].g-input[i*c_cols+j].g)*(groupRep[newGroup].g-input[i*c_cols+j].g)+
(groupRep[newGroup].b-input[i*c_cols+j].b)*(groupRep[newGroup].b-input[i*c_cols+j].b)
)
{
newGroup = ng;
}
}
if (pixelGroup[i*c_cols+j] != newGroup) {
pixelGroup[i*c_cols+j] = newGroup;
d_change = 1;
}
j += gridDim.x * blockDim.x;
}
j = blockDim.x * blockIdx.x + threadIdx.x;
i += gridDim.y * blockDim.y;
}
}
__global__
void clut_createImage(Color *clutImage, int *pixelGroup, Color *groupRep){
int i = blockDim.y * blockIdx.y + threadIdx.y;
int j = blockDim.x * blockIdx.x + threadIdx.x;
while (i < c_rows) {
while (j < c_cols) {
clutImage[i*c_cols+j].r = groupRep[pixelGroup[i*c_cols+j]].r;
clutImage[i*c_cols+j].g = groupRep[pixelGroup[i*c_cols+j]].g;
clutImage[i*c_cols+j].b = groupRep[pixelGroup[i*c_cols+j]].b;
j += gridDim.x * blockDim.x;
}
j = blockDim.x * blockIdx.x + threadIdx.x;
i += gridDim.y * blockDim.y;
}
}
Utilities.cuh
#ifndef _UTILITIES_CUH_
#define _UTILITIES_CUH_
#include <iostream>
#include <fstream>
#include <string>
#define SUCCESS 1
#define FAILURE 0
#define checkCudaError(val) check( (val), #val, __FILE__, __LINE__)
typedef struct {
int r;
int g;
int b;
} vec3u;
typedef vec3u Color;
typedef unsigned char uchar;
typedef uchar Grayscale;
struct GpuTimer{
cudaEvent_t start;
cudaEvent_t stop;
GpuTimer(){
cudaEventCreate(&start);
cudaEventCreate(&stop);
}
~GpuTimer(){
cudaEventDestroy(start);
cudaEventDestroy(stop);
}
void Start(){
cudaEventRecord(start, 0);
}
void Stop(){
cudaEventRecord(stop, 0);
}
float Elapsed(){
float elapsed;
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsed, start, stop);
return elapsed;
}
};
template<typename T>
void check(T err, const char* const func, const char* const file, const int line) {
if (err != cudaSuccess) {
std::cerr << "CUDA error at: " << file << ":" << line << std::endl;
std::cerr << cudaGetErrorString(err) << " " << func << std::endl;
exit(1);
}
}
int writeGrayscale2Text(const std::string filename, const Grayscale *image, const int rows, const int cols);
int readText2Grayscale(const std::string filename, Grayscale **image, int *rows, int *cols);
int writeRGB2Text(const std::string filename, const Color *image, const int rows, const int cols);
int readText2RGB(const std::string filename, Color **image, int *rows, int *cols);
struct CpuTimer{
clock_t start;
clock_t stop;
void Start(){
start = clock();
}
void Stop(){
stop = clock();
}
float Elapsed(){
return ((float)stop-start)/CLOCKS_PER_SEC * 1000.0f;
}
};
#endif
Utilities.cu
#include "Utilities.cuh"
int writeGrayscale2Text(const std::string filename, const Grayscale *image, const int rows, const int cols){
std::ofstream fileWriter(filename.c_str());
if (!fileWriter.is_open()) {
std::cerr << "** writeGrayscale2Text() ** : Unable to open file." << std::endl;
return FAILURE;
}
fileWriter << rows << "\n";
fileWriter << cols << "\n";
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
fileWriter << (int)image[i*cols+j] << "\n";
}
}
fileWriter.close();
return SUCCESS;
}
int readText2Grayscale(const std::string filename, Grayscale **image, int *rows, int *cols){
std::ifstream fileReader(filename.c_str());
if (!fileReader.is_open()) {
std::cerr << "** readText2Grayscale() ** : Unable to open file." << std::endl;
return FAILURE;
}
fileReader >> *rows;
fileReader >> *cols;
*image = new Grayscale[(*rows)*(*cols)];
int value;
for (int i = 0; i < *rows; i++) {
for (int j = 0; j < *cols; j++) {
fileReader >> value;
(*image)[i*(*cols)+j] = (Grayscale)value;
}
}
fileReader.close();
return SUCCESS;
}
int writeRGB2Text(const std::string filename, const Color *image, const int rows, const int cols){
std::ofstream fileWriter(filename.c_str());
if (!fileWriter.is_open()) {
std::cerr << "** writeRGB2Text() ** : Unable to open file." << std::endl;
return FAILURE;
}
fileWriter << rows << "\n";
fileWriter << cols << "\n";
for (int k = 0; k < 3; k++) {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
switch (k) {
case 0:
fileWriter << image[i*cols+j].r << "\n";
break;
case 1:
fileWriter << image[i*cols+j].g << "\n";
break;
case 2:
fileWriter << image[i*cols+j].b << "\n";
break;
}
}
}
}
fileWriter.close();
return SUCCESS;
}
int readText2RGB(const std::string filename, Color **image, int *rows, int *cols){
std::ifstream fileReader(filename.c_str());
if (!fileReader.is_open()) {
std::cerr << "** readText2Grayscale() ** : Unable to open file." << std::endl;
return FAILURE;
}
fileReader >> *rows;
fileReader >> *cols;
*image = new Color[(*rows)*(*cols)];
for (int k = 0; k < 3; k++) {
for (int i = 0; i < *rows; i++) {
for (int j = 0; j < *cols; j++) {
switch (k) {
case 0:
fileReader >> (*image)[i*(*cols)+j].r;
break;
case 1:
fileReader >> (*image)[i*(*cols)+j].g;
break;
case 2:
fileReader >> (*image)[i*(*cols)+j].b;
break;
}
}
}
}
fileReader.close();
return SUCCESS;
}
Constant memory has implicit local scope linkage - answer to this on stack overflow.
This means that the cudaMemcpyToSymbol have to be in the same generated .obj file of the kernel where you want to use it.
You do your memcopy in Main.cu, but the kernel where you use your canstant memory is in Imageproc.cu. So for the constant values are unknown for the kernel chunche.
A option to solve you're problem can be, to implement a wrapper. Just add a function in Imagepro.cu where you do the cudaMemcpyToSymbol and call the wrapper in Main.cu and pass your desired values for the constant memory in there.
I have written a code using Thrust. I am pasting the code and its output below. Strangely, when the device_vector line is reached during exectution the screen just hangs and no more output comes. It was working in the morning. Please help me.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <iostream>
int main(void)
{
// H has storage for 4 integers
thrust::host_vector<int> H(4);
// initialize individual elements
H[0] = 14;
H[1] = 20;
H[2] = 38;
H[3] = 46;
// H.size() returns the size of vector H
std::cout << "H has size " << H.size() << std::endl;
// print contents of H
for(size_t i = 0; i < H.size(); i++)
std::cout << "H[" << i << "] = " << H[i] << std::endl;
// resize H
H.resize(2);
std::cout << "H now has size " << H.size() << std::endl;
// Copy host_vector H to device_vector D
thrust::device_vector<int> D = H;
// elements of D can be modified
D[0] = 99;
D[1] = 88;
// print contents of D
for(size_t i = 0; i < D.size(); i++)
std::cout << "D[" << i << "] = " << D[i] << std::endl;
// H and D are automatically deleted when the function returns
return 0;
}
The output is :
H has size 4
H[0] = 14
H[1] = 20
H[2] = 38
H[3] = 46
H now has size 2
* After this nothing happens
Run Device Query. I am confident that if the code was working in the morning, the problem is due to the graphics card.