Hi everyone I've a question about templates in c++.
I would like to explain what I wonder via an example. Lets max() will be our template function:
template <typename Type>
Type max(Type tX, Type tY)
{
return (tX > tY) ? tX : tY;
}
Now, when i call this max in my main, for each call does the compiler generate the function
and replaces the templates type with actual types ?
I mean;
int main()
{
int result1,result2;
float result3;
result1=max(3,5);
result2=max(10,12);
result3=max(4.5,12.2);
return 0;
}
In here max will be copied 3 times and replaced its parameters or something else ? Is there anyone who can help me ? Thanks in advance.
My understanding is that a compiler typically resolves templates once per data type per compilation unit. And the linker does clever stuff to stop code bloat: i.e. multiple copies of the same function across all compilation units are condensed into 1. The early Microsoft C++ linkers didn't bother doing any such thing and the generated code was large.
In your example I would expect two functions to be generated; one with two arguments and one with four.
Ah - I see you've edited the post to have two floating point parameters in the last case rather than four integral types.
Related
I am trying to parallelize this for loop inside a function using OpenMP, but when I compile the code I still have an error =(
Error 1 error C3010: 'return' : jump out of OpenMP structured block not allowed.
I am using Visual studio 2010 C++ compiler. Can anyone help me? I appreciate any advice.
int match(char* pattern, int patternSize, char* string, int startFrom, unsigned int &comparisons) {
comparisons = 0;
#pragma omp for
for (int i = 0; i < patternSize; i++){
comparisons++;
if (pattern[i] != string[i + startFrom])
return 0;
}
return 1;
}
As #Hristo has already mentioned, you are not allowed to branch out of a parallel region in OpenMP. Among other reasons, this is not allowed because the compiler cannot know a priori how many iterations each thread should work on when it splits up a for loop like the one that you have written among the different threads.
Furthermore, even if you could branch out of your loop, you should be able to see that comparisons would be computed incorrectly. As is, you have an inherently serial algorithm that breaks at the first different character. How could you split up this work such that throwing more threads at this algorithm possibly makes it faster?
Finally, note that there is very little work being done in this loop anyway. You would be very unlikely to see any benefit from OpenMP even if you could rewrite this algorithm into a parallel algorithm. My suggestion: drop OpenMP from this loop and look to implement it somewhere else (either at a higher level - maybe you call this method on different strings? - or in a section of your code that does more work).
I'm still evaluating if i should start using D for prototyping numerical code in physics.
One thing that stops me is I like boost, specifically fusion and mpl.
D is amazing for template meta-programming and i would think it can do mpl and fusion stuff but I would like to make sure.
Even if i'll start using d, it would take me a while to get to the mpl level. So i'd like someone to share their experience.
(by mpl i mean using stl for templates and by fusion, i mean stl for tuples.)
a note on performance would be nice too, since it's critical in physics simulations.
In D, for the most part, meta-programming is just programming. There's not really any need for a library like boost.mpl
For example, consider the lengths you would have to go to in C++ to sort an array of numbers at compile time. In D, you just do the obvious thing: use std.algorithm.sort
import std.algorithm;
int[] sorted(int[] xs)
{
int[] ys = xs.dup;
sort(ys);
return ys;
}
pragma(msg, sorted([2, 1, 3]));
This prints out [1, 2, 3] at compile time. Note: sort is not built into the language and has absolutely no special code for working at compile time.
Here's another example that builds a lookup table for Fibonacci sequence at compile time.
int[] fibs(int n)
{
auto fib = recurrence!("a[n-1] + a[n-2]")(1, 1);
int[] ret = new int[n];
copy(fib.take(n), ret);
return ret;
}
immutable int[] fibLUT = fibs(10).assumeUnique();
Here, fibLUT is constructed entirely at compile time, again without any special compile time code needed.
If you want to work with types, there are a few type meta functions in std.typetuple. For example:
static assert(is(Filter!(isUnsigned, int, byte, ubyte, dstring, dchar, uint, ulong) ==
TypeTuple!(ubyte, uint, ulong)));
That library, I believe, contains most of the functionality you can get from Fusion. Remember though, you really don't need to use much of template meta-programming stuff in D as much as you do in C++, because most of the language is available at compile time anyway.
I can't really comment on performance because I don't have vast experience with both. However, my instinct would be that D's compile time execution is faster because you generally don't need to instantiate numerous templates. Of course, C++ compilers are more mature, so I could be wrong here. The only way you'll really find out is by trying it for your particular use case.
I have a question that I cannot answer myself, but it seems like a fundamentally good question to clear up:
Why do some languages restrict the data returned from a function to a single item?
Is this serving some benefit? Or is it a practice brought over from Maths?
An example being (in Scala):
def login(username: String, password: String): User
If I wanted to return multiple items I cannot say it in the same manner as I just did for the input arguments (now entering imaginary Scala land)
def login(username: String, password: String): (User, Context, String)
Or even with named data returned:
def login(username: String, password: String): (user: User, context: Context, serverMessage: String)
There is no relationship: as observed, an arbitrary number of values can be returned, even if they must be "packaged" into a single value.
Imagine a language that can only accept a single tuple and can only return a single tuple from a function (the tuples can be any size). These functions then resemble math function transforming a vector from one space to another.
However, some reasons why it might be so:
Most functions only return one value, which may be a collection of values (object, sequence, etc.) Decompositions of the single value is supported in a number of languages, even though "only one value is returned".
The calling conventions and signatures are simpler: there is no special case/overhead to signal that n-values are being returned: there is no need to use part of the stack to return multiple values, a single register will do.
The need to fit in with the target architecture: earlier, especially lower-level languages, were heavily influenced by the computer architecture. In the case of Scala, for instance, it must work on the JVM.
It's just how the language was designed. Many (most?) languages borrow heavily -- syntax and/or methodologies -- from existing languages. Sometimes this is good, sometimes it is not so good. C# appeased Java appeased C++ appeased C, for instance: it's all about the market share.
It Just Works.
Even while "returning only one value", programming languages already have different ways of dealing with it. As noted in the post, some languages allow decomposition (the tuple returned as "decomposed" into it's two values during an assignment):
def multiMath (i):
return (i + i, i * i)
double, squared = multiMath(4)
# doubled is 8
# squared is 16
Additionally, other languages like C# which lacks decomposition, allow pass-by-reference (or emulate it in with mutation of an object):
void multiMath (int a, out int doubled, out int squared) {
doubled = a + a;
squared = a * a;
}
int d, s;
multiMath(4, out d, out s);
// d is now 8
// s is now 16
And, of course... ;-)
class ANewClassForThisFunctionsReturn {
...
}
There are likely more methods I am not aware of.
Happy coding.
Because tipicaly returned data is assigned to a variable. And thy are only few languages that can assign two variables in a single sentence.
A = sum(1,2)
B,C = dateTime()
Technically they are no problem to return more than one parameter because parameters are stacked, the issue is on assignment. Here a sample of this needed:
/* div example */
#include <stdio.h>
#include <stdlib.h>
int main ()
{
div_t divresult;
divresult = div (38,5);
printf ("38 div 5 => %d, remainder %d.\n", divresult.quot, divresult.rem);
return 0;
}
V.S
long quot, rem;
quot, rem = div(38,5)
The code does not work. But when I comment atomicAdd in the following code, the code works.
What is the reason for that?
Where can I get histogram code for float array?
__global__ void calculateHistogram(float *devD, int* retHis)
{
int globalLi = getCurrentThread(); //get the thread ID
if(globalLi>=0 && globalLi<Rd*Cd*Dd)
{
int r=0,c=0,d=0;
GetInd2Sub(globalLi, Rd, Cd, r, c, d); //some calculations to get r,c,d
if(r>=stYd && r<edYd && c>=stXd && c<edXd && d>=stZd && d<edZd)
{
//calculate the histogram
int indexInHis = GetBinNo(devD[globalLi]); //get the bin number in the histogram
atomicAdd(&retHis[indexInHis],1); //when I comment this line the code works
}
}
}
Take a look at chapter 9 of CUDA by Example by Jason Sanders and Edward Kandrot. It covers atomics and goes through a simple example computing histograms of 8-bit integers. The first version uses an atomic add for each value, which works but is very slow. The refined version of the example computes a histogram for each block in shared memory, then merges all the histograms together into global memory to get the final result. Your code is like the first version, once you get it working you will want to make it more like the fast refined version.
You can download the examples from the book to see both versions: CUDA by Example downloads
You don't appear to give complete code or error messages, so I can't say exactly what is going wrong in your code. Here are some thoughts:
You need to compile with an architecture that supports atomics (i.e. greater than the default 1.0 architecture target)
The indexing and index limits appear somewhat complicated, I would double-check those
Your bin calculation might be giving bin numbers outside a valid range for retHis, I would add some checks before using the return value, at least for debugging
I haven't written C in quite some time and am writing an app using the MySQL C API, compiling in g++ on redhat.
So i start outputting some fields with printfs... using the oracle api, with PRO*C, which i used to use (on suse, years ago), i could select an int and output it as:
int some_int;
printf("%i",some_int);
I tried to do that with mysql ints and i got 8 random numbers displayed... i thought this was a mysql api issue and some config issue with my server, and i wasted a few hours trying to fix it, but couldn't, and found that i could do:
int some_int;
printf("%s",some_int);
and it would print out the integer properly. Because i'm not doing computations on the values i am extracting, i thought this an okay solution.
UNTIL I TRIED TO COUNT SOME THINGS....
I did a simple:
int rowcount;
for([stmt]){
rowcount++;
}
printf("%i",rowcount);
i am getting an 8 digit random number again... i couldn't figure out what the deal is with ints on this machine.
then i realized that if i initialize the int to zero, then i get a proper number.
can someone please explain to me under what conditions you need to initialize int variables to zero? i don't recall doing this every time in my old codebase, and i didn't see it in the example that i was modeling my mysql_stmt code from...
is there something i'm missing? also, it's entirely possible i've forgotten this is required each time
thanks...
If you don't initialize your variables, there's no guarantee of a default 0/NULL/whatever value. Some compilers MIGHT initialize it to 0 for you (IIRC, MSVC++ 6.0 would be kind enough to do so), and others might not. So don't rely on it. Never use a variable without first giving it some sort of sane value.
Only global and static values will be initialized to zero. The variables on the stack will always contain garbage value if not initialized.
int g_var; //This is a global varibale. So, initialized to zero
int main()
{
int s_var = 0; //This is on stack. So, you need to explicitly initialize
static int stat_var; //This is a static variable, So, initialized to zero
}
You always neet to initialize your variables. To catch this sort of error, you should probably compile with -Wall to give you all warnings that g++ can provide. I also prefer to use -Werror to make all warnings errors, since it's almost always the case that a warning indicates an error or a potential error and that cleaning up the code is better than leaving it as is.
Also, in your second printf, you used %s which is for printing strings, not integers.
int i = 0;
printf("%d\n", i);
// or
printf("%i\n", i);
Is what you want.
Variable are not automatically initialized in c.
You have indeed forgotten. In C and C++, you don't get any automatic initialization; the contents of c after int c; are whatever happens to be at the address referred to by c at the time.
Best practice: initialize at the definition: int c = 0;.
Oh, PS, and take some care that the MySQL int type matches the C int type; I think it does but I'm not positive. It will be, however, both architecture and compiler sensitive, since sizeof(int) isn't the same in all C environments.
Uninitialized variable.
int some_int = 0;