Thanks for helping everyone. I will continue looking at it so I can better understand! I am still struggling with recursion but I will study it more. Thanks again for all your time and effort for trying to help me
- /
int countEven(int arr[i]){
//I'm not sure what to do here... how to fix it...
int evens = 0;
if(arr[i] <= 0) return 0; //base case
while(arr[i] > 0){
int digit = arr[i]%10; //get the last digit
if(digit%2 == 0){
evens = evens+1;
}
arr[i] = arr[i]/10;
}
cout << evens;
}
}
}
int main(){
cout << "Part A:\n";
int arr[3] = { 5050155, 5, 707070 };
for (int i = 0; i < 3; i++){
cout << "countEven(" << arr[i] << ") = " << countEven(arr[i]) << endl;
cout << "removeEven(" << arr[i] << ") = " << removeEven(arr[i]) << endl;
cout << "hasEven(" << arr[i] << ") = ";
if (hasEven(arr[i])) cout << "Yes" << endl;
else cout << "No" << endl;
printStarDigit(arr[i]);
cout << endl << endl;
}
cout << "Part B:\n";
int a[4] = { 7, 2, 8, 3 };
int b[5] = { 3, 4, 5, 6, 7 };
cout << "The range of array a is " << range(a, 4) << endl;
cout << "The range of array b is " << range(b, 5) << endl;
reverse(a, 4);
reverse(b, 5);
cout << "Array a reversed: ";
for (int i = 0; i < 4; ++i)
cout << a[i] << " ";
cout << endl;
cout << "Array b reversed: ";
for (int i = 0; i < 5; ++i)
cout << b[i] << " ";
cout << endl;
return 0;
}
int countEven(int arr[i]){
Parameters must have simple names, and do not need to be identical to the expressions passed in. arr[i] is not a valid name. A simple name to use here is n.
//I'm not sure what to do here... how to fix it...
int evens = 0;
if(arr[i] <= 0) return 0; //base case
This base case is wrong for two reasons. Firstly, you treat all negative integers as the base case, but -202020 has six even digits. Secondly, you return the wrong value: 0 has one even digit, but you return zero.
A possible base case could be n > -10 && n < 10 (single digit number). I'll let you figure out the expression to return for that base case.
while(arr[i] > 0){
If your task is to write a recursive function, then you shouldn't use a loop here. Instead, see below.
int digit = arr[i]%10; //get the last digit
...
arr[i] = arr[i]/10;
This is a correct way of obtaining the last digit, and everything other than the last digit.
if(digit%2 == 0)
This is a correct way of determining whether the last digit is even.
Now, you need to combine what you have, by observing that the count of even digits is equal to "1 if the last digit is even, else 0", plus the count of even non-last digits. The goal of the exercise is to get you to write "the count of even non-last digits" as countEven(n / 10).
See these code snippets.
int countEven(int n, bool first_time = true){
static int total ;
int num =n;
if (first_time)
{
total = 0;
}
if (num <= 9) {
return num % 2 == 0 ? 1 : 0;
}
int temp = num - (num / 10) * 10;
num = num / 10;
total += temp % 2 == 0 ? 1 : 0;
countEven(num,false);
return total;
}
int removeEven(int n){
int result = 0;
/*TODO*/
return result;
}
bool hasEven(int n){
return countEven(n);
}
void printStarDigit(int* arr){
/*TODO*/
}
int range(int* arr, int n){
int result = *arr;
for (int i = 0; i < n; i++) {
if (result < *(arr + i)) result = *(arr + i);
}
return result;
}
void reverse(int* arr, int n){
int* temp = new int[n];
for (int i = 0; i < n; i++) {
*(temp + i) = *(arr + i);
}
for (int i = 0; i < n; i++) {
*(arr + i) = *(temp + n - 1 - i);
}
delete[] temp;
}
Main implementation :
int main() {
cout << "Part A:\n";
int arr[3] = { 5050155, 5, 707070 };
for (int i = 0; i < 3; i++) {
cout << "countEven(" << arr[i] << ") = " << countEven(arr[i]) << endl;
cout << "removeEven(" << arr[i] << ") = " << removeEven(arr[i]) << endl;
cout << "hasEven(" << arr[i] << ") = ";
if (hasEven(arr[i])) cout << "Yes" << endl;
else cout << "No" << endl;
printStarDigit(arr + i);
cout << endl << endl;
}
cout << "Part B:\n";
int a[4] = { 7, 2, 8, 3 };
int b[5] = { 3, 4, 5, 6, 7 };
cout << "The range of array a is " << range(a, 4) << endl;
cout << "The range of array b is " << range(b, 5) << endl;
reverse(a, 4);
reverse(b, 5);
cout << "Array a reversed: ";
for (int i = 0; i < 4; ++i)
cout << a[i] << " ";
cout << endl;
cout << "Array b reversed: ";
for (int i = 0; i < 5; ++i)
cout << b[i] << " ";
cout << endl;
return 0;
}
Output:
Related
I have a code to convert my decimal number to a binary number but since the number is huge 1,4e8, it either always gives me a round number or it uses too much memory and crashes here is my code
double n = 1466333969;
double e = 10487;
double facteurs = 0;
long d = 0;
facteurs = factorisation(n);
d = calculerD(facteurs, e);
long dTemp = d / 2;
int dBin[70] = {};
int i = 0;
cout << dTemp << endl;
do
{
if (dTemp == floor(d / 2))
{
//cout << D << " " << dTemp << endl;
dBin[i] = 0;
}
else dBin[i] = 1;
d = floor(d / 2);
dTemp = d / 2;
//cout << D << " " << dBin[i] << endl;
i++;
} while (d != 0);
I checked and the calculerD function works fine and return the right d but afterwards, it just seems to stay in longs.
This program is supposed to use a single function to return multiple values. However, since a function can return a single value at a time, I have implemented a switch. However, when I run the code, the gross pay, the taxes, and the net pay default to 0 instead of adding up. Does anyone know why this occurs? I am only allowed to use a single function with a switch to return the different values.
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <iomanip>
#include <string.h>
using namespace std;
double paycalculations(double, double, double, double, int);
double paycalculations(double hrs, double pay, double fedrate, double staterate, int option)
{
switch (option)
{
case 1:
double reg;
if (hrs < 0)
{
hrs = hrs * (-1);
}
if (hrs <= 40)
{
reg = hrs * pay;
}
else
{
reg = 40 * pay;
}
return reg;
break;
case 2:
double overtime;
overtime = (hrs - 40) * (1.5 * pay);
return overtime;
break;
case 3:
double gross;
if (overtime > 0)
{
gross = reg + overtime;
}
else
{
gross = reg;
}
return gross;
break;
case 4:
double ftaxes;
ftaxes = gross * fedrate;
return ftaxes;
break;
case 5:
double staxes;
staxes = gross * staterate;
return staxes;
break;
case 6:
double socialsectaxes;
socialsectaxes = gross * 0.062;
return socialsectaxes;
break;
case 7:
double medtaxes;
medtaxes = gross * 0.0145;
return medtaxes;
break;
case 8:
double net;
net = gross - (ftaxes + staxes + socialsectaxes + medtaxes);
return net;
break;
}
}
int main ()
{
string employeeID;
double hours, payrate, fedtaxrate, statetaxrate, regpay, overtimepay, grosspay, fedtaxes, statetaxes, sosectaxes, meditaxes, netpay;
const int SIZE = 100;
char again = 'y', fullname[SIZE] = {" "};
while (again == 'y')
{
hours = payrate = fedtaxrate = statetaxrate = regpay = overtimepay = fedtaxes = statetaxes = sosectaxes = meditaxes = netpay = 0;
system("reset");
cout <<"Employee Full Name: ";
cin.getline(fullname, SIZE);
cout << "\nEmployee ID: ";
cin >> employeeID;
cout << "\nHours Worked (Maximum of 60): ";
cin >> hours;
cout << "\nEmployee Payrate: $";
cin >> payrate;
cout << "\nFederal Tax Rate (Maximum of 0.5 or 50%): ";
cin >> fedtaxrate;
cout << "\nState Tax Rate (Maximum of 0.3 or 30%): ";
cin >> statetaxrate;
cout.setf(ios::fixed | ios::showpoint);
system("reset");
cout << "\t\t\t\t\tPayroll";
cout << "\n\nEmployee ID: " << employeeID;
cout << "\t\nEmployee Name: " << fullname;
cout.precision(1);
cout << "\n\nTotal Hours Worked: " << "\t\t\t\t" << hours;
cout.precision(2);
regpay = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 1);
if (hours <= 40)
{
cout << "\nRegular Hours Worked: " << hours << " # $" << payrate << "/hour" << "\t" << "$" << regpay;
}
overtimepay = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 2);
if (hours > 40)
{
cout << "\nRegular Hours Worked: " << "40.00 # $" << payrate << "/hour" << "\t" << "$" << regpay;
cout << "\nOvertime Hours Worked: " << (hours - 40) << " # $" << (payrate * 1.5) << "/hour" << "\t" << "$" << overtimepay;
}
grosspay = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 3);
cout << "\nGross Pay: " << "\t\t\t\t\t" << "$" << grosspay;
fedtaxes = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 4);
cout << "\nFederal Taxes Withheld #" << (fedtaxrate * 100) << "%" << "\t\t\t" << "$" << fedtaxes;
statetaxes = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 5);
cout << "\nState Taxes Withheld #" << (statetaxrate * 100) << "%" << "\t\t\t" << "$" << statetaxes;
sosectaxes = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 6);
cout << "\nSocial Security Withheld #" << (0.062 * 100) << "%" << "\t\t\t" << "$" << sosectaxes;
meditaxes = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 7);
cout << "\nMedicare Withheld #" << (0.0145 * 100) << "%" << "\t\t\t" << "$" << meditaxes;
netpay = paycalculations(hours, payrate, fedtaxrate, statetaxrate, 8);
cout << "\nNet Pay: " << "\t\t\t\t\t" << "$" << netpay;
cout << "\n\nWould you like to compute another employee's pay check? (Y or N): ";
cin >> again;
again = tolower(again);
}
}
I am relatively new to c++ programming and I am struggling with my code. The objective of this code is to take scores input by the user and calculate the mean, the standard deviation and converting it to a letter grade using the calculations under char gradeFunction. When i try to debug this program using visual studios 2013, i am having a couple problems with the the gradefunction. Again i am new to programming so troubleshooting errors is very hard for me and I would appreciate any help or advice! The program looks like this so far.
#include <iostream>
#include <iomanip>
#include <cmath>
#include <string.h>
#include <string>
using namespace std;
void printArray(int Array[], int count);
double average(double scoreTotal, int count);
double stddev(int Array[], int count, double mean);
char gradeFunction(int scores, double stddev, double mean);
int main()
{
int scores[8];
int count;
double scoreTotal = 0;
int standarddev[8];
double mean;
cout << "Enter scores seperated by blanks:" " ";
for (count = 0; count <= 7; count++)
{
cin >> scores[count];
scoreTotal += scores[count];
mean = scoreTotal / 8;
}
cout << endl;
cout << "Grade Scores by Student" << endl;
cout << "Score" "\t" "Grade" << endl;
cout << "----------------------------------" << endl;
printArray(scores, 8);
cout << gradeFunction(scores, stddev, mean);
cout << endl;
cout << "The mean is" " "<< fixed << setprecision(1) << average(scoreTotal, count) << endl;
cout << "The standard deviation is" " " << stddev(scores, count, mean) << endl;
cout << endl;
system("pause");
return 0;
}
void printArray(int Array[], int count)
{
for (int x = 0; x < count; x++)
{
cout << fixed << setprecision(1) << Array[x] << endl;
}
}
char gradeFunction(int scores, double stddev, double mean)
{
char F, D, C, B, A;
if (scores <= (mean - (1.5 * stddev)))
return 'F';
else if (scores <= (mean - (.5 * stddev)))
return 'D';
else if (scores <= (mean + (.5 * stddev)))
return 'C';
else if (scores <= (mean + (1.5 * stddev)))
return 'B';
else return 'A';
}
double average(double scoreTotal, int count)
{
return scoreTotal / count;
}
double stddev(int Array[], int count , double mean)
{
double stddev;
double sum2 = 0;
for (int i = 0; i < count; i++)
{
sum2 += pow((Array[i] - mean), 2);
}
stddev = sqrt(sum2 / (count - 1));
return stddev;
}
The error messages this leaves me with are...
3 IntelliSense: argument of type "double (*)(int *Array, int count, double mean)" is incompatible with parameter of type "double"
Error 1 error C2664: 'char gradeFunction(int [],double,double)' : cannot convert argument 2 from 'double (__cdecl *)(int [],int,double)' to 'double'
I am to write a cuda code which searches set of keyword strings inside set of data strings and returns an array of boolean for keyword-data string pairs. Data strings: at the moment, 10000(may vary) strings and each of them has max 250 chars.
Keyword strings: at the moment, 100(may vary) strings and each of them has max 100 chars.
Length of each string is known.
My question is which of the following approaches might be more suitable in this case.
1st:
gridDim.x => # of keyword strings
gridDim.y => # of data strings
blockDim => (max string size(250 in this case),1,1)
Naive algorithm will be used for search
Each thread will load the chars of keyword and data to shared mem from global mem.
Each thread will be responsible for one of the windows in naive search algorithm.
Result will be written to the boolean array.
So, each block will be responsible for keyword-data pair.
2nd:
gridDim => (# of data strings,1,1)
blockDim => (# of keyword strings,1,1)
In each block, data string will be loaded to shared mem.
In this case, each thread will be responsible for keyword-data pair instead of block.
Each thread will search corresponding keyword inside the data string.
Naive algorithm is not necessary in this case, Boyer-Moore might be used.
For searches inside huge files, since length of the data is much bigger than the length of the keyword, 1st approach is used. But in this case, I am not sure if the 1st appraoch is better. On the other hand, for 2nd approach, coalescing the keywords might be a problem, since the lengths are not fixed. There is an upper boundry for the size of the keywords. So, padding might ease the coalescing but it would consume more memory.
Anyhow, if you have worked on a similar case or know about a better approach than those I described above, please help me out.
Thank you in advance.
So, I've implemented both of the cases. Code for approach 1:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "stdio.h"
#include "iostream"
#include "chrono"
#include "cstdlib"
#define SEARCHTERMSIZE 100
#define SEARCHITEMSIZE 65000
#define MAXDATASTRINGSIZE 250
#define MAXKEYWORDSTRINGSSIZE 50
using namespace std;
__global__ void searchKeywordKernel(bool* resultPtr, const char * dataPtr, const short* dataLengths, const char * keywordPtr, const short* keywordLengths)
{
int dataIndex = blockIdx.x;
int keywordIndex = blockIdx.y;
int dataLength = dataLengths[dataIndex];
int keywordLength = keywordLengths[keywordIndex];
__shared__ char sData[MAXDATASTRINGSIZE];
__shared__ char sKeyword[MAXKEYWORDSTRINGSSIZE];
__shared__ bool isFound;
if (dataIndex < SEARCHITEMSIZE && keywordIndex < SEARCHTERMSIZE)
{
if (dataLength < keywordLength)
{
resultPtr[keywordIndex*SEARCHITEMSIZE + dataIndex] = false;
}
else
{
isFound = false;
sData[threadIdx.x] = dataPtr[dataIndex*MAXDATASTRINGSIZE + threadIdx.x];
if (threadIdx.x < keywordLength)
sKeyword[threadIdx.x] = keywordPtr[keywordIndex*MAXKEYWORDSTRINGSSIZE + threadIdx.x];
__syncthreads();
if (threadIdx.x <= dataLength - keywordLength)
{
for (int i = 0; i < keywordLength && !isFound; i++)
{
if (sData[threadIdx.x + i] != sKeyword[i])
break;
if (i == keywordLength - 1)
isFound = true;
}
}
resultPtr[keywordIndex*SEARCHITEMSIZE + dataIndex] = isFound;
}
}
}
int main()
{
chrono::steady_clock::time_point startTime;
chrono::steady_clock::time_point endTime;
typedef chrono::duration<int, milli> millisecs_t;
//////////Search Data Init/////////////////
cout << "Before Search Data Init" << endl;
startTime = chrono::steady_clock::now();
char* dataPtr = (char*)malloc(sizeof(char)*MAXDATASTRINGSIZE*SEARCHITEMSIZE);
short* dataLengths = new short[SEARCHITEMSIZE];
short temp;
short tempChar;
for (int i = 0; i < SEARCHITEMSIZE; i++)
{
temp = rand() % (MAXDATASTRINGSIZE - 20) + 20;
for (int k = 0; k < temp; k++)
{
tempChar = rand() % 26;
dataPtr[i*MAXDATASTRINGSIZE + k] = 97 + tempChar; //97->a, 98->b, 122->z
}
dataLengths[i] = temp;
}
endTime = chrono::steady_clock::now();
millisecs_t duration(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "After Search Data Init: " << duration.count() << "ms" << endl;
//////////Search Data Init/////////////////
//////////Search Keyword Init/////////////////
cout << "Before Search Keyword Init" << endl;
startTime = chrono::steady_clock::now();
char* keywordPtr = (char*)malloc(sizeof(char)*MAXKEYWORDSTRINGSSIZE*SEARCHTERMSIZE);
short* keywordLengths = new short[SEARCHTERMSIZE]; //lenghts, not the start positions
for (int i = 0; i < SEARCHTERMSIZE; i++)
{
temp = rand() % (MAXKEYWORDSTRINGSSIZE - 10) + 10;
for (int k = 0; k < temp; k++)
{
tempChar = rand() % 26;
keywordPtr[i*MAXKEYWORDSTRINGSSIZE + k] = 97 + tempChar; //97->a, 98->b, 122->z
}
keywordLengths[i] = temp;
}
endTime = chrono::steady_clock::now();
millisecs_t duration1(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "After Search Keyword Init: " << duration1.count() << "ms" << endl;
//////////Search Keyword Init/////////////////
char* d_dataPtr;
short* d_dataLengths;
char* d_keywordPtr;
short* d_keywordLengths;
bool* d_resultPtr;
/////////////////////////CudaMalloc/////////////////////////////////
cout << "Before Malloc" << endl;
startTime = chrono::steady_clock::now();
cudaMalloc(&d_dataPtr, sizeof(char) * SEARCHITEMSIZE * MAXDATASTRINGSIZE);
cudaMalloc(&d_dataLengths, sizeof(short) * SEARCHITEMSIZE);
cudaMalloc(&d_keywordPtr, sizeof(char) * SEARCHTERMSIZE*MAXKEYWORDSTRINGSSIZE);
cudaMalloc(&d_keywordLengths, sizeof(short) * SEARCHTERMSIZE);
cudaMalloc(&d_resultPtr, sizeof(bool)*SEARCHITEMSIZE * SEARCHTERMSIZE);
endTime = chrono::steady_clock::now();
millisecs_t duration2(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "After Malloc: " << duration2.count() << "ms" << endl;
/////////////////////////CudaMalloc/////////////////////////////////
cudaEvent_t start, stop;
float elapsedTime;
/////////////////////////CudaMemCpy///////////////////////////////////
cout << "Before Memcpy" << endl;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
cudaMemcpy(d_dataPtr, dataPtr, sizeof(char) * SEARCHITEMSIZE * MAXDATASTRINGSIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_dataLengths, dataLengths, sizeof(short) * SEARCHITEMSIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_keywordPtr, keywordPtr, sizeof(char) * SEARCHTERMSIZE*MAXKEYWORDSTRINGSSIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_keywordLengths, keywordLengths, sizeof(short) * SEARCHTERMSIZE, cudaMemcpyHostToDevice);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
cout << "After Memcpy: " << elapsedTime << "ms" << endl;
/////////////////////////CudaMemCpy///////////////////////////////////
////////////////////////Kernel//////////////////////////////////////////
cout << "Before Kernel" << endl;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
dim3 dimGrid(SEARCHITEMSIZE,SEARCHTERMSIZE);
searchKeywordKernel << < dimGrid, MAXDATASTRINGSIZE >> >(d_resultPtr, d_dataPtr, d_dataLengths, d_keywordPtr, d_keywordLengths);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
cout << "After Kernel: " << elapsedTime << "ms" << endl;
////////////////////////Kernel//////////////////////////////////////////
bool* result = new bool[SEARCHTERMSIZE*SEARCHITEMSIZE];
cudaMemcpy(result, d_resultPtr, sizeof(bool) * SEARCHITEMSIZE * SEARCHTERMSIZE, cudaMemcpyDeviceToHost);
/////////////////////////////////// CPU code //////////////////////////////////////////
bool* cpuResult = new bool[SEARCHTERMSIZE*SEARCHITEMSIZE];
cout << "CPU code starts" << endl;
startTime = chrono::steady_clock::now();
for (int i = 0; i < SEARCHTERMSIZE; i++)
{
for (int j = 0; j < SEARCHITEMSIZE; j++)
{
if (dataLengths[j] < keywordLengths[i])
{
cpuResult[i*SEARCHITEMSIZE + j] = false;
break;
}
else
{
for (int k = 0; k <= dataLengths[j] - keywordLengths[i]; k++)
{
cpuResult[i*SEARCHITEMSIZE + j] = true;
for (int l = 0; l < keywordLengths[i]; l++)
{
if (dataPtr[j*MAXDATASTRINGSIZE + k + l] != keywordPtr[i*MAXKEYWORDSTRINGSSIZE + l])
{
cpuResult[i*SEARCHITEMSIZE + j] = false;
break;
}
}
if (cpuResult[i*SEARCHTERMSIZE + j])
break;
}
}
}
}
endTime = chrono::steady_clock::now();
millisecs_t duration3(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "CPU code ends: " << duration3.count() << "ms" << endl;
/////////////////////////////////// CPU code //////////////////////////////////////////
////////////////////////////////////Result Comparison////////////////////////////////////////
bool kernelRes = true;
for (int i = 0; i < SEARCHITEMSIZE*SEARCHTERMSIZE; i++)
{
if (cpuResult[i] != result[i])
{
kernelRes = false;
break;
}
}
////////////////////////////////////Result Comparison////////////////////////////////////////
cout << boolalpha << "Kernel computation: " << kernelRes << endl;
cout << "Before Deleting arrays" << endl;
delete[] dataPtr;
delete[] keywordPtr;
delete[] dataLengths;
delete[] keywordLengths;
delete[] result;
delete[] cpuResult;
cout << "After Deleting arrays" << endl;
cout << "Before Freeing device memory" << endl;
cudaFree(d_dataPtr);
cudaFree(d_keywordPtr);
cudaFree(d_dataLengths);
cudaFree(d_keywordLengths);
cudaFree(d_resultPtr);
cout << "After Freeing device memory" << endl;
cudaDeviceReset();
system("pause");
return 0;
}
Code for approach 2:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>
#include <chrono>
#include <cstdlib>
#define SEARCHTERMSIZE 198
#define SEARCHITEMSIZE 65000
#define MAXDATASTRINGSIZE 250
#define MAXKEYWORDSTRINGSSIZE 50
using namespace std;
__global__ void searchKeywordKernel(bool* resultPtr, const char * __restrict__ dataPtr, const short* dataLengths, const char * keywordPtr, const short* keywordLengths)
{
int dataIndex = blockIdx.x;
int keywordIndex = threadIdx.x;
int dataLength = dataLengths[dataIndex];
int keywordLength = keywordLengths[keywordIndex];
__shared__ char sData[MAXDATASTRINGSIZE];
if (dataIndex < SEARCHITEMSIZE)
{
int my_tid = keywordIndex;
while (my_tid < dataLength)
{
sData[my_tid] = dataPtr[dataIndex*MAXDATASTRINGSIZE + my_tid];
my_tid += blockDim.x;
}
__syncthreads();
if (keywordIndex < SEARCHTERMSIZE)
{
if (dataLength < keywordLength)
{
resultPtr[dataIndex*SEARCHTERMSIZE + keywordIndex] = false;
}
else
{
bool isFound = true;
for (int i = 0; i <= dataLength - keywordLength; i++)
{
for (int j = 0; j < keywordLength; j++)
{
if (sData[i + j] != keywordPtr[j*SEARCHTERMSIZE + keywordIndex])
{
isFound = false;
break;
}
}
if (isFound)
break;
}
resultPtr[dataIndex*SEARCHTERMSIZE + keywordIndex] = isFound;
}
}
}
}
int main()
{
chrono::steady_clock::time_point startTime;
chrono::steady_clock::time_point endTime;
typedef chrono::duration<int, milli> millisecs_t;
//////////Search Data Init/////////////////
cout << "Before Search Data Init" << endl;
startTime = chrono::steady_clock::now();
char* dataPtr = (char*)malloc(sizeof(char)*MAXDATASTRINGSIZE*SEARCHITEMSIZE);
short* dataLengths = new short[SEARCHITEMSIZE];
short temp;
short tempChar;
for (int i = 0; i < SEARCHITEMSIZE; i++)
{
temp = rand() % (MAXDATASTRINGSIZE - 20) + 20;
for (int k = 0; k < temp; k++)
{
tempChar = rand() % 26;
dataPtr[i*MAXDATASTRINGSIZE + k] = 97 + tempChar; //97->a, 98->b, 122->z
}
dataLengths[i] = temp;
}
endTime = chrono::steady_clock::now();
millisecs_t duration(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "After Search Data Init: " << duration.count() << "ms" << endl;
//////////Search Data Init/////////////////
//////////Search Keyword Init/////////////////
cout << "Before Search Keyword Init" << endl;
startTime = chrono::steady_clock::now();
char* keywordPtr = (char*)malloc(sizeof(char)*MAXKEYWORDSTRINGSSIZE*SEARCHTERMSIZE);
short* keywordLengths = new short[SEARCHTERMSIZE]; //lenghts, not the start positions
for (int i = 0; i < SEARCHTERMSIZE; i++)
{
temp = rand() % (MAXKEYWORDSTRINGSSIZE - 10) + 10;
for (int k = 0; k < temp; k++)
{
tempChar = rand() % 26;
keywordPtr[i*MAXKEYWORDSTRINGSSIZE + k] = 97 + tempChar; //97->a, 98->b, 122->z
}
keywordLengths[i] = temp;
}
endTime = chrono::steady_clock::now();
millisecs_t duration1(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "After Search Keyword Init: " << duration1.count() << "ms" << endl;
//////////Search Keyword Init/////////////////
////////////////////Traverse Keyword Array////////////////////////////
char* keywordPtr_T = new char[SEARCHTERMSIZE*MAXKEYWORDSTRINGSSIZE];
for (int i = 0; i < SEARCHTERMSIZE; i++)
for (int j = 0; j < MAXKEYWORDSTRINGSSIZE; j++)
keywordPtr_T[j*SEARCHTERMSIZE + i] = keywordPtr[i*MAXKEYWORDSTRINGSSIZE + j];
////////////////////Traverse Keyword Array////////////////////////////
char* d_dataPtr;
short* d_dataLengths;
char* d_keywordPtr;
short* d_keywordLengths;
bool* d_resultPtr;
/////////////////////////CudaMalloc/////////////////////////////////
cout << "Before Malloc" << endl;
startTime = chrono::steady_clock::now();
cudaMalloc(&d_dataPtr, sizeof(char) * SEARCHITEMSIZE * MAXDATASTRINGSIZE);
cudaMalloc(&d_dataLengths, sizeof(short) * SEARCHITEMSIZE);
cudaMalloc(&d_keywordPtr, sizeof(char) * SEARCHTERMSIZE*MAXKEYWORDSTRINGSSIZE);
cudaMalloc(&d_keywordLengths, sizeof(short) * SEARCHTERMSIZE);
cudaMalloc(&d_resultPtr, sizeof(bool)*SEARCHITEMSIZE * SEARCHTERMSIZE);
endTime = chrono::steady_clock::now();
millisecs_t duration2(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "After Malloc: " << duration2.count() << "ms" << endl;
/////////////////////////CudaMalloc/////////////////////////////////
cudaEvent_t start, stop;
float elapsedTime;
/////////////////////////CudaMemCpy///////////////////////////////////
cout << "Before Memcpy" << endl;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
cudaMemcpy(d_dataPtr, dataPtr, sizeof(char) * SEARCHITEMSIZE * MAXDATASTRINGSIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_dataLengths, dataLengths, sizeof(short) * SEARCHITEMSIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_keywordPtr, keywordPtr_T, sizeof(char) * SEARCHTERMSIZE*MAXKEYWORDSTRINGSSIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_keywordLengths, keywordLengths, sizeof(short) * SEARCHTERMSIZE, cudaMemcpyHostToDevice);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
cout << "After Memcpy: " << elapsedTime << "ms" << endl;
/////////////////////////CudaMemCpy///////////////////////////////////
////////////////////////Kernel//////////////////////////////////////////
cout << "Before Kernel" << endl;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
searchKeywordKernel << < SEARCHITEMSIZE, SEARCHTERMSIZE >> >(d_resultPtr, d_dataPtr, d_dataLengths, d_keywordPtr, d_keywordLengths);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
cout << "After Kernel: " << elapsedTime << "ms" << endl;
////////////////////////Kernel//////////////////////////////////////////
bool* result_T = new bool[SEARCHTERMSIZE*SEARCHITEMSIZE];
bool* result = new bool[SEARCHTERMSIZE*SEARCHITEMSIZE];
cudaMemcpy(result_T, d_resultPtr, sizeof(bool) * SEARCHITEMSIZE * SEARCHTERMSIZE, cudaMemcpyDeviceToHost);
for (int i = 0; i < SEARCHTERMSIZE; i++)
for (int j = 0; j < SEARCHITEMSIZE; j++)
result[j*SEARCHTERMSIZE + i] = result_T[i*SEARCHITEMSIZE + j];
/////////////////////////////////// CPU code //////////////////////////////////////////
bool* cpuResult = new bool[SEARCHTERMSIZE*SEARCHITEMSIZE];
cout << "CPU code starts" << endl;
startTime = chrono::steady_clock::now();
for (int i = 0; i < SEARCHTERMSIZE; i++)
{
for (int j = 0; j < SEARCHITEMSIZE; j++)
{
if (dataLengths[j] < keywordLengths[i])
{
cpuResult[i*SEARCHITEMSIZE + j] = false;
break;
}
else
{
for (int k = 0; k <= dataLengths[j] - keywordLengths[i]; k++)
{
cpuResult[i*SEARCHITEMSIZE + j] = true;
for (int l = 0; l < keywordLengths[i]; l++)
{
if (dataPtr[j*MAXDATASTRINGSIZE + k + l] != keywordPtr[i*MAXKEYWORDSTRINGSSIZE + l])
{
cpuResult[i*SEARCHITEMSIZE + j] = false;
break;
}
}
if (cpuResult[i*SEARCHTERMSIZE + j])
break;
}
}
}
}
endTime = chrono::steady_clock::now();
millisecs_t duration3(chrono::duration_cast<millisecs_t>(endTime - startTime));
cout << "CPU code ends: " << duration3.count() << "ms" << endl;
/////////////////////////////////// CPU code //////////////////////////////////////////
////////////////////////////////////Result Comparison////////////////////////////////////////
bool kernelRes = true;
for (int i = 0; i < SEARCHITEMSIZE*SEARCHTERMSIZE; i++)
{
if (cpuResult[i] != result[i])
{
kernelRes = false;
break;
}
}
////////////////////////////////////Result Comparison////////////////////////////////////////
cout << boolalpha << "Kernel computation: " << kernelRes << endl;
cout << "Before Deleting arrays" << endl;
delete[] dataPtr;
delete[] keywordPtr;
delete[] keywordPtr_T;
delete[] dataLengths;
delete[] keywordLengths;
delete[] result;
delete[] result_T;
delete[] cpuResult;
cout << "After Deleting arrays" << endl;
cout << "Before Freeing device memory" << endl;
cudaFree(d_dataPtr);
cudaFree(d_keywordPtr);
cudaFree(d_dataLengths);
cudaFree(d_keywordLengths);
cudaFree(d_resultPtr);
cout << "After Freeing device memory" << endl;
cudaDeviceReset();
system("pause");
return 0;
}
Second approach gave better results than the first approach. Yet the performance of the second approach depends on the number of keywords. If the number of the keywords is multiple of 192, gpu has performance than cpu (time of malloc+memcpy+kernel < time of cpu).
What should I do to overcome such dependancy? Would it be viable to increase the number of threads and to pass multiple data strings rather than one in each block?
I suggest blockDim = (16, 16, 1) and gridDim = (# of data strings / 16, # of keyword strings / 16, 1). In your case, where tens of strings can ideally fit in shared memory, such block-grid division will lead to minimum global memory access while introducing no computation overhead.
Padding is not a good choice, unless each string is expected to have its length quite close to the maximum (80% of maximum for example). If you keep a array of offset of every string (CPU is good at generating it), coalescing global memory read is just trivial.
Can you help me find out why one of the FFTW's plans gives zeroes at the end of an output array? The "fftw_plan_dft_1d" yields proper result as I checked it with Matlab. The Real to Complex plan "fftw_plan_dft_r2c_1d" makes some zeroes at the end. I don't understand why.
Here is the simple testing code using both plans.
#include <iostream>
#include <complex.h>
#include <fftw3.h>
using namespace std;
int main()
{
fftw_complex *in, *out, *out2;
double array[] = {1.0,2.0,3.0,4.0,5.0,6.0,0.0,0.0};
fftw_plan p, p2;
int N = 8;
in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
out2 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
for (int i = 0; i < N; i++) {
in[i] = i+1+0*I;
}
in[6] = 0+0*I;
in[7] = 0+0*I;
cout << "complex array" << endl;
for (int i = 0; i < N; i++) {
cout << "[" << i << "]: " << creal(in[i]) << " + " << cimag(in[i]) << "i" << endl;
}
cout << endl;
cout << "real array" << endl;
for (int i = 0; i < N; i++) {
cout << "[" << i << "]: " << array[i] << endl;
}
cout << endl;
p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
p2 = fftw_plan_dft_r2c_1d(N, array, out2, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
fftw_execute(p2);
cout << "fftw_plan_dft_1d:" << endl;
for (int i = 0; i < N; i++) {
cout << "[" << i << "]: " << creal(out[i]) << " + " << cimag(out[i]) << "i" << endl;
}
cout << endl;
cout << "fftw_plan_dft_r2c_1d:" << endl;
for (int i = 0; i < N; i++) {
cout << "[" << i << "]: " << creal(out2[i]) << " + " << cimag(out2[i]) << "i" << endl;
}
cout << endl;
fftw_destroy_plan(p);
fftw_destroy_plan(p2);
fftw_free(in);
fftw_free(out);
fftw_free(out2);
return 0;
}
Result:
complex array
[0]: 1 + 0i
[1]: 2 + 0i
[2]: 3 + 0i
[3]: 4 + 0i
[4]: 5 + 0i
[5]: 6 + 0i
[6]: 0 + 0i
[7]: 0 + 0i
real array
[0]: 1
[1]: 2
[2]: 3
[3]: 4
[4]: 5
[5]: 6
[6]: 0
[7]: 0
fftw_plan_dft_1d:
[0]: 21 + 0i
[1]: -9.65685 + -3i
[2]: 3 + -4i
[3]: 1.65685 + 3i
[4]: -3 + 0i
[5]: 1.65685 + -3i
[6]: 3 + 4i
[7]: -9.65685 + 3i
fftw_plan_dft_r2c_1d:
[0]: 21 + 0i
[1]: -9.65685 + -3i
[2]: 3 + -4i
[3]: 1.65685 + 3i
[4]: -3 + 0i
[5]: 0 + 0i
[6]: 0 + 0i
[7]: 0 + 0i
As you can see there is this strange difference between both plans and the result should be the same.
As you have noted, the fftw_plan_dft_1d function computes the standard FFT Yk of the complex input sequence Xn defined as
where j=sqrt(-1), for all values k=0,...,N-1 (thus generating N complex outputs in the array out), .
You may notice that since the input happens to be real, the output exhibits Hermitian symmetry, that is for N=8:
out[4] == conj(out[4]); // the central one (out[4] for N=8) must be real
out[5] == conj(out[3]);
out[6] == conj(out[2]);
out[7] == conj(out[1]);
where conj is the usual complex conjugate operator.
Or course, when using fftw_plan_dft_1d FFTW doesn't know the input just happens to be real, and thus does not take advantage of the symmetry.
The fftw_plan_dft_r2c_1d on the other hand takes advantage of that symmetry, and as indicated in "What FFTW Really Computes" section for "1d real data" of FFTW's documentation (emphasis mine):
As a result of this symmetry, half of the output Y is redundant (being the complex conjugate of the other half), and so the 1d r2c transforms only output elements 0...n/2 of Y (n/2+1 complex numbers), where the division by 2 is rounded down.
Thus in your case with N=8, only N/2+1 == 5 complex values are filled in out2, leaving the remaining 3 unitilialized (those values just happened to be zeros before the call to fftw_plan_dft_r2c_1d, do not rely on them being set to 0). If needed, those other values could of course be obtained from symmetry with:
for (i = (N/2)+1; i<N; i++) {
out2[i] = conj(out2[N-i]);
}