I'm not too good with C++, however; my code compiled, but the function crashes my program, the below is a short sum-up of the code; it's not complete, however the function and call is there.
void rot13(char *ret, const char *in);
int main()
{
char* str;
MessageBox(NULL, _T("Test 1; Does get here!"), _T("Test 1"), MB_OK);
rot13(str, "uryyb jbeyq!"); // hello world!
/* Do stuff with char* str; */
MessageBox(NULL, _T("Test 2; Doesn't get here!"), _T("Test 2"), MB_OK);
return 0;
}
void rot13(char *ret, const char *in){
for( int i=0; i = sizeof(in); i++ ){
if(in[i] >= 'a' && in[i] <= 'm'){
// Crashes Here;
ret[i] += 13;
}
else if(in[i] > 'n' && in[i] <= 'z'){
// Possibly crashing Here too?
ret[i] -= 13;
}
else if(in[i] > 'A' && in[i] <= 'M'){
// Possibly crashing Here too?
ret[i] += 13;
}
else if(in[i] > 'N' && in[i] <= 'Z'){
// Possibly crashing Here too?
ret[i] -= 13;
}
}
}
The function gets to "Test 1; Does get Here!" - However it doesn't get to "Test 2; Doesn't get here!"
Thank you in advanced.
-Nick Daniels.
str is uninitialised and it is being dereferenced in rot13, causing the crash. Allocate memory for str before passing to rot13() (either on the stack or dynamically):
char str[1024] = ""; /* Large enough to hold string and initialised. */
The for loop inside rot13() is also incorrect (infinte loop):
for( int i=0; i = sizeof(in); i++ ){
change to:
for(size_t i = 0, len = strlen(in); i < len; i++ ){
You've got several problems:
You never allocate memory for your output - you never initialise the variable str. This is what's causing your crash.
Your loop condition always evaluates to true (= assigns and returns the assigned value, == tests for equality).
Your loop condition uses sizeof(in) with the intention of getting the size of the input string, but that will actually give you the size of the pointer. Use strlen instead.
Your algorithm increases or decreases the values in the return string by 13. The values you place in the output string are +/- 13 from the initial values in the output string, when they should be based on the input string.
Your algorithm doesn't handle 'A', 'n' or 'N'.
Your algorithm doesn't handle any non-alphabetic characters, yet the test string you use contains two.
Related
here is my function:
int repeatedNTimes(int* A, int ASize)
{
int i, count, j, temp;
for(i = 0; i < ASize; ++i)
{
count = 0;
temp = A[i];
for(j = i; j < ASize; ++j)
{
if(A[i] == A[j])
count++;
}
if(count == ASize / 2)
return A[i];
else
continue;
}
return 0;
}
Can I use return 1, or return (any integer) instead of return 0?
And secondly, what if I don't return an integer?
If you do not return an integer, then the behavior is not well defined (probably undefined, but I don't have the standard memorized). Your compiler will likely emit a warning if you have warnings on.
As for returning an integer other than 0, yes, you can do that. What matters is the return type of the function when it comes to what you can and cannot return. That said, returning a different result may not have the effect you want depending on what your function does. Sometimes values like zero are reserved for special conditions like not found.
When reading a JSON string from the serial port on an ESP8266 it cuts off the beginning of the data.
I have tried reading data from the Serial port and printing each character, however it is cutting off part of the begging of the data.
void setup() {
Serial.begin(115200);
while (!Serial) {
;
}
}
void loop() {
int curSize = 30;
char* buffer = new char[curSize];
std::fill_n(buffer, curSize, 0);
int pos = 0;
Serial.print("Sending: ");
while(Serial.available() == false) delay(500);
while (Serial.available()) {
char c = Serial.read();
Serial.print(c);
if(pos == curSize-1){
char* newBuffer = increaseBuffer(buffer, curSize, curSize + 30);
curSize += 30;
delete[] buffer;
buffer = newBuffer;
}
if(c == '\n'){
buffer[pos] = 0;
pos = 0;
break;
}
buffer[pos++] = c;
}
if(buffer[0] != 0) {
sendBuffer(buffer);
}
delete[] buffer;
}
char* increaseBuffer(char* orig, int oldSize, int newSize){
char* data = new char[newSize];
std::fill_n(data, newSize, 0);
for(int i = 0; i < newSize; i++){
if(i < oldSize) data[i] = orig[i];
else data[i] = '\0';
}
return data;
}
JSON data used (and expected output)
{"type":0,"ver":"0.0.1","T":[28,29,29,29,29,29,29,29,29,29],"H":[59.1608,59.1608,60,59.1608,60,60,60,59.1608,59.1608,59.1608],"DP":[20.36254,20.36254,20.59363,20.36254,20.59363,20.59363,20.59363,20.36254,20.36254],"HI":[30.90588,30.90588,31.0335,30.90588,31.0335,31.0335,31.0335,30.90588,30.90588]}
examples of what is actually output
Example 1: 9,29,29,29,29,29,29,29,29],"H":[59.1608,59.1608,60,59.1608,60,60,60,59.1608,59.1608,59.1608],"DP":[20.36254,20.36254,20.59363,20.36254,20.59363,20.59363,20.59363,20.36254,20.36254],"HI":[30.90588,30.90588,31.0335,30.90588,31.0335,31.0335,31.0335,30.90588,30.90588]}
Example 2: 29,29,29,29,29,29,29,29,29],"H":[59.1608,59.1608,60,59.1608,60,60,60,59.1608,59.1608,59.1608],"DP":[20.36254,20.36254,20.59363,20.36254,20.59363,20.59363,20.59363,20.36254,20.36254],"HI":[30.90588,30.90588,31.0335,30.90588,31.0335,31.0335,31.0335,30.90588,30.90588]}
Try making the delay 1 instead of 500 in the blocking loop that's waiting for data to start coming in. I'm going to guess what happens is that on one iteration of that loop Serial.available() is false and during the delay you start to get data coming in that ends up getting written over by the time your delay ends to check again.
What I'm picturing is the following. If you were to expand out that delay(500) to be delay(1) called 500 times.
while(Serial.available() == false){
delay(1);
delay(1);
// ...
delay(1); // first character comes in
delay(1);
delay(1); // second character comes in
// ...
delay(1); // n character comes in
}
Then after the delay is over you start actually collecting the characters that are coming in.
I'm currently working on a server which is part of my course requirement. The specs require me to parse a request line and store the appropriate data as absolute path (abs_path) and query.
Here is my code:
bool parse(const char* line, char* abs_path, char* query)
{
int space = 0;
if (strchr(line, '"') != NULL)
{
error(400);
return false;
}
for (int i = 0; line[i] != '\0'; i++)
{
if (line[i] == ' ')
{
space++;
}
}
if (space != 2)
{
error(400);
return false;
}
if (strncmp("GET ", line, 4) != 0)
{
error(405);
return false;
}
line = strchr(line, ' ');
line++;
if (strncmp("/", line, 1) != 0)
{
error(501);
return false;
}
int j = 0;
int k = 4;
while (line[k] != ' ')
{
int m = k;
abs_path[j] = line[k];
j++;
if (line[k+1] == '?')
{
abs_path[j] = '\0';
int l = 0;
m = k+2;
while (line[m] != ' ')
{
query[l] = line[m];
l++;
m++;
}
if (line[m] == ' ' && l == 0)
{
query[0] = '\0';
}
}
k = m;
k++;
if (line[k] == ' ')
{
abs_path[j] = '\0';
break;
}
}
char* last = strrchr(line, ' ');
last++;
if (strcmp("HTTP/1.1", last) != 0)
{
error(505);
return false;
}
free(abs_path);
return true;
}
I keep getting a segmentation fault with this. After some debugging, I've found the segmentation fault to be eliminated if I declare, on line 20, abs_path as an array instead of a pointer. However, it is necessary for me to declare abs_path as a pointer, so I need another solution to this. Can someone explain to me what exactly I am doing wrong with regards to strings and their handling?
I have been quite rusty with this due to personal reasons so pardon me if I misunderstand something basic.
Thank you in advance!
You need to allocate memory for the char*. An array does this automatically during compile time. Try malloc.
char* abs_path = 0;
abs_path = (char*)malloc(256);
Essentially, malloc performs an operating system call to reserve sequential memory from the heap. The argument of malloc is how many bytes to reserve. Therefore, you should be aware that the example above enables the abs_path char* to point to a string that is at max 255 characters (leaving 1 byte for the null character '\0'). Don't let your code write more than 255 characters or you will overwrite other data in memory, which is why you received a seg-fault before.
As someone else noted, you should DEFINITELY free up memory reserved dynamically with malloc.
free(abs_path);
So like:
void aLoop(){
int i = 0;
while(i < 10){
aFunction();
i++;
}
}
int aFunction(int i){
if(aVariable == 1){
i = 10;
}
if(aVariable != 1){
statement;
statement;
i = i;
}
return i;
}
Where aFunction() will be called for each i (0,1,2,3,...,9) and for each call will satisfy either the first if statement or the second.
Assuming all functions and variables are declared, would this be able to stop the while loop if aVariable == 1?
How else could you accomplish the same thing?
I'm really inexperienced with programming.
FIXED:
void aLoop(){
int i = 0;
while(i < 10){
i = aFunction(i);
i++;
}
}
int aFunction(int i){
if(aVariable == 1){
i = 10;
}
if(aVariable != 1){
statement;
statement;
i = i;
}
return i;
}
instead of
aFunction(x);
just use
i = aFunction(x);
use return to terminate a method.
USe break to terminate a for/ while loop or in a switch statement.
void aLoop(){
int i = 0;
do{
aFunction();
System.out.print(i+" ");
i++;
}while(i < 10);
}
Your suggested solution under "FIXED" will work but if you wrote a large program using this approach you'd end up with software that would be very complex and very costly to work on. This is because aFunction is dependent on the function aLoop that calls it. Functions should ideally be independent of each other, whereas aFunction only works if it's being called from a while loop. You almost never want a called function to be dependent on the structure of the function that calls it.
Try to code so that the the responsibilities or "intentions" of each part of the program are clear, so that any dependencies are minimal and obvious. E.g. here you could write
void aLoop(){
bool continueProcessing = true;
for(int i=0;
continueProcessing && i < 10;
i++) {
continueProcessing = aFunction(i);
}
}
int aFunction(int i){
bool stillProcessing = aVariable != 1;
if (stillProcessing) {
statement;
statement;
}
return stillProcessing;
}
Of course, in aLoop there are some other options which amount to the same thing. You could carry on with a while loop (I think for is clearer). Also, you could break out of the loop instead of having an extra continueProcessing variable.
void aLoop(){
for(int i=0; i < 10; i++) {
if (!aFunction(i))
break;
}
}
Finally I'm not sure you even need to pass the variable i into aLoop. If you don't, or some other data is more appropriate, it would be better to change this too.
I use a GTX 280, which has compute capability 1.3 and supports atomic operations on shared memory. I am using cuda SDK 2.2 and VS 2005. In my program I have to extensively use atomic operations because there is simply no other way.
One example is that I have to calculate the running sum of an array and find out the index where the sum exceeds a given cut off value. For this I am using a variant of scan algorithm and using atomicMin to store index while the value is less than the threshold. So this way at the end the shared memory would have the index where the value is just less than the threshold.
This is just one component of the kernel, and there are many similar code blocks in the kernel call.
I am having 3 problems
Firstly I have not been able to compile the code as it say atomic operations are not defined, I have searched but not found which file I have to add.
Second, I somehow managed to compile the code by copying it in the code provided by CUDA SDK, but then it is saying the atomic operations are not supported on shared memory, where as it is running in the following program
Even when I worked around a hack by giving -arch sm_12 in the command line compilation, the code snippet using these atomic operations are taking an awful lot of time.
I believe that in the worst case I should get some sort of speed up, because there are not very many atomic operations and I using 1 block of 16x16. Unfortunately the serial code in running 10x faster.
Below I am posting the kernel cod*, this kernel call seems to be the bottleneck if anyone could help me optimize then it would be nice. The serial code is just performing these actions in a serial manner. I am using a block configuration of 16 X 16.
The code seems to be lengthy but actually it contains an if code block and while code block that perform almost the same task, but they could not be merged.
#define limit (int)(log((float)256)/log((float)2))
// This receives a pointer to an image, some variables and 4 more arrays cont(of size 256) vars(some constants), lim and buf(of image size)
// block configuration 1 block of 16x16
__global__ void kernel_Main(unsigned char* in, int height,int width, int bs,int th, double cutoff, uint* cont,int* vars, unsigned int* lim,unsigned int* buf)
{
int j = threadIdx.x;
int i = threadIdx.y;
int k = i*blockDim.x+j;
__shared__ int prefix_sum[256];
__shared__ int sum_s[256];
__shared__ int ary_shared[256];
__shared__ int he_shared[256];
// this is the threshold
int cutval = (2*width*height)*cutoff;
prefix_sum[k] = cont[k];
int l;
// a variant of scan algorithm
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > cutval)
{
atomicMin(&vars[cut],k);
}
}
__syncthreads();
}
// The first thread will store the value in global array
if(k==0)
{
vars[cuts]=prefix_sum[vars[cut]];
}
__syncthreads();
if(vars[n])
{
// bs = 7 in this case
if(i<bs && j<bs)
{
// using atomic add because the index could be same for 2 different threads
atomicAdd(&ary_shared[in[i*(width) + j]],1);
}
__syncthreads();
int minth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[k];
sum_s[k] = 0;
// Again prefix sum
int l;
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > minth)
{
atomicMin(&vars[hmin],k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hminc]=prefix_sum[255];
// because we will always overshoot by 1
vars[hmin]--;
}
__syncthreads();
int maxth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[255-k];
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > maxth)
{
atomicMin(&vars[hmax], k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hmaxc]=prefix_sum[255];
vars[hmax]--;
vars[hmax]=255-vars[hmax];
}
__syncthreads();
int rng = vars[hmax] - vars[hmin];
if(rng >= vars[cut])
{
if( k <= vars[hmin] )
he_shared[k] = 0;
else if( k >= vars[hmax])
he_shared[k] = 255;
else
he_shared[k] = (255 * (k - vars[hmin])) / rng;
}
__syncthreads();
// only 7x7 = 49 threads will do this
if(i>0 && i<=bs && j>0 && j<=bs)
{
int base = (vars[oy]*width+vars[ox])+ (i-1)*width + (j-1);
if(rng >= vars[cut])
{
int value = he_shared[in[base]];
buf[base]+=value;
lim[base]++;
}
else
{
buf[base]+=255;
lim[base]++;
}
}
if(k==0)
vars[n]--;
__syncthreads();
}// if(n) block closes here
while(vars[n])
{
if(k==0)
{
if( vars[ox]==0 && vars[d1] ==3 )
vars[d1] = 0; // l2r
else if( vars[ox]==0 && vars[d1]==2 )
vars[d1] = 3; // l u2d
else if( vars[ox]==width-bs && vars[d1]==0)
vars[d1] = 1; // r u2d
else if( vars[ox]==width-bs && vars[d1]==1)
vars[d1] = 2; // r2l
}
// Because this value will be changed so
// all the threads should set their registers before
// they move forward
int ox_d = vars[ox];
int oy_d = vars[oy];
// Just putting it here so that all the threads should have set their
// values before moving on, as this value will be changed
__syncthreads();
if(vars[d1]==0)
{
if(i == 0 && j < bs)
{
int index = j*width + ox_d + oy_d*width;
int index2 = j*width + ox_d + oy_d*width +bs;
atomicSub(&ary_shared[in[index]],1);
atomicAdd(&ary_shared[in[index2]],1);
}
// The first thread of the first block should set this value
if(k==0)
vars[ox]++;
}
else if(vars[d1]==1||vars[d1]==3)
{
if(i == 0 && j < bs)
{
/*if(j==0)
printf("Entered 1||3\n");*/
int index = j*width + ox_d + oy_d*width;
int index2 = j*width + ox_d + (oy_d+bs)*width;
atomicSub(&ary_shared[in[index]],1);
atomicAdd(&ary_shared[in[index2]],1);
}
// The first thread of the first block should set this value
if(k==0)
vars[oy]++;
}
else if(vars[d1]==2)
{
if(i == 0 && j < bs)
{
int index = j*width + ox_d-1 + oy_d*width;
int index2 = j*width + ox_d-1 + oy_d*width +bs;
atomicAdd(&ary_shared[in[index]],1);
atomicSub(&ary_shared[in[index2]],1);
}
// The first thread of the first block should set this value
if(k==0 )
vars[ox]--;
}
__syncthreads();
//ary_shared has been calculated
// Reset the hmin and hminc values
// again the same task as done in the if(n) loop
if(k==0)
{
vars[hmin]=0;
vars[hminc]=0;
vars[hmax]=0;
vars[hmaxc]=0;
}
__syncthreads();
int minth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[k];
int l;
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > minth)
{
atomicMin(&vars[hmin],k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hminc]=prefix_sum[255];
vars[hmin]--;
}
__syncthreads();
// Calculate maxth
int maxth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[255-k];
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > maxth)
{
atomicMin(&vars[hmax], k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hmaxc]=prefix_sum[255];
vars[hmax]--;
vars[hmax]=255-vars[hmax];
}
__syncthreads();
int rng = vars[hmax] - vars[hmin];
if(rng >= vars[cut])
{
if( k <= vars[hmin] )
he_shared[k] = 0;
else if( k >= vars[hmax])
he_shared[k] = 255;
else
he_shared[k] = (255 * (k - vars[hmin])) / rng;
}
__syncthreads();
if(i>0 && i<=bs && j>0 && j<=bs)
{
int base = (vars[oy]*width+vars[ox])+ (i-1)*width + (j-1);
if(rng >= vars[cut])
{
int value = he_shared[in[base]];
buf[base]+=value;
lim[base]++;
}
else
{
buf[base]+=255;
lim[base]++;
}
}
// This just might cause a little bit of problem
if(k==0)
vars[n]--;
// All threads will wait here before continuing the while loop
__syncthreads();
}// end of while(n)
}
Firstly you need -arch sm_12 (or in your case it should really be -arch sm_13) to enable atomic operations.
As for performance, there is no guarantee that your kernel will be any faster than normal code on the CPU - there are many problems which really do not fit well into the CUDA model and these may indeed run much slower than on the CPU. You need to do some analysis/design/modelling before coding any CUDA kernels to prevent yourself wasting a lot of time on something that is never going to fly.
Having said that, there may be a way to implement your algo in a more efficient way - maybe you could post the CPU code and then invite ideas as to how to efficiently implement it in CUDA ?