what would be the outcome of calling the procedure using
a) Pass by value
b) Pass by reference
c) Pass by name
d) Pass by value-result
void main () {
int value = 2, list[5] = {1, 3, 5, 7, 9}; swap (value, list[0]);
swap (list[0], list[1]);
swap (value, list[value]);
}
void swap (int a, int b) {
int temp;
temp = a; a = b;
b = temp;
}
Related
I have to calculate how many anagrams are in a given word.
I have tried using factorial, permutations and using the posibilities for each letter in the word.
This is what I have done.
static int DoAnagrams(string a, int x)
{
int anagrams = 1;
int result = 0;
x = a.Length;
for (int i = 0; i < x; i++)
{ anagrams *= (x - 1); result += anagrams; anagrams = 1; }
return result;
}
Example: for aabv I have to get 12; for aaab I have to get 4
As already stated in a comment there is a formula for calculating the number of different anagrams
#anagrams = n! / (c_1! * c_2! * ... * c_k!)
where n is the length of the word, k is the number of distinct characters and c_i is the count of how often a specific character occurs.
So first of all, you will need to calculate the faculty
int fac(int n) {
int f = 1;
for (int i = 2; i <=n; i++) f*=i;
return f;
}
and you will also need to count the characters in the word
Dictionary<char, int> countChars(string word){
var r = new Dictionary<char, int>();
foreach (char c in word) {
if (!r.ContainsKey(c)) r[c] = 0;
r[c]++;
}
return r;
}
Then the anagram count can be calculated as follows
int anagrams(string word) {
int ac = fac(word.Length);
var cc = countChars(word);
foreach (int ct in cc.Values)
ac /= fac(ct);
return ac;
}
Answer with Code
This is written in c#, so it may not apply to the language you desire, but you didn't specify a language.
This works by getting every possible permutation of the string, adding every copy found in the list to another list, then removing those copies from the original list. After that, the count of the original list is the amount of unique anagrams a string contains.
private static List<string> anagrams = new List<string>();
static void Main(string[] args)
{
string str = "AAAB";
char[] charArry = str.ToCharArray();
Permute(charArry, 0, str.Count() - 1);
List<string> copyList = new List<string>();
for(int i = 0; i < anagrams.Count - 1; i++)
{
List<string> anagramSublist = anagrams.GetRange(i + 1, anagrams.Count - 1 - i);
var perm = anagrams.ElementAt(i);
if (anagramSublist.Contains(perm))
{
copyList.Add(perm);
}
}
foreach(var copy in copyList)
{
anagrams.Remove(copy);
}
Console.WriteLine(anagrams.Count);
Console.ReadKey();
}
static void Permute(char[] arry, int i, int n)
{
int j;
if (i == n)
{
var temp = string.Empty;
foreach(var character in arry)
{
temp += character;
}
anagrams.Add(temp);
}
else
{
for (j = i; j <= n; j++)
{
Swap(ref arry[i], ref arry[j]);
Permute(arry, i + 1, n);
Swap(ref arry[i], ref arry[j]); //backtrack
}
}
}
static void Swap(ref char a, ref char b)
{
char tmp;
tmp = a;
a = b;
b = tmp;
}
Final Notes
I know this isn't the cleanest, nor best solution. This is simply one that carries the best across the 3 object oriented languages I know, that's also not too complex of a solution. Simple to understand, simple to change languages, so it's the answer I've decided to give.
EDIT
Here's the a new answer based on the comments of this answer.
static void Main(string[] args)
{
var str = "abaa";
var strAsArray = new string(str.ToCharArray());
var duplicateCount = 0;
List<char> dupedCharacters = new List<char>();
foreach(var character in strAsArray)
{
if(str.Count(f => (f == character)) > 1 && !dupedCharacters.Contains(character))
{
duplicateCount += str.Count(f => (f == character));
dupedCharacters.Add(character);
}
}
Console.WriteLine("The number of possible anagrams is: " + (factorial(str.Count()) / factorial(duplicateCount)));
Console.ReadLine();
int factorial(int num)
{
if(num <= 1)
return 1;
return num * factorial(num - 1);
}
}
If I have the struct
cdef struct Interval:
unsigned int start
unsigned int end
unsigned int index
I can assign values to it like
i.start = 1
but can I set all the values (start, end, index) in one go?
I couldn't actually find this in the documentation, but cython does support the equivalent of struct initialization in c
%%cython
def f():
cdef Interval i = [1, 1, 3]
return i.index
c code generates to:
struct __pyx_t_46_cython_magic_f52bf70efc56b7361a3a2e15f913f262_Interval __pyx_t_1;
/* "_cython_magic_f52bf70efc56b7361a3a2e15f913f262.pyx":14
*
* def f():
* cdef Interval i = [1, 1, 3] # <<<<<<<<<<<<<<
* return i.index
*/
__pyx_t_1.start = 1;
__pyx_t_1.end = 1;
__pyx_t_1.index = 3;
I have the following problem that I want to implement on CUDA:
I want to read an array (say "flag[20]"), and based on a certain condition, write indices of this array to another array (say "pindex[]")
Simple code implementation in C can be:
int N = 20;
int flag[N];
int pindex[N];
for(int i=0;i<N;i++)
flag[i] = -1;
for(int i=0;i<N;i+=2)
flag[i] = 0;
for(int i=0;i<N;i++)
pindex[i] = 0;
//operation: count # of times flag != -1 and write those indices in a different array
int pcount1 = 0;
for(int i=0;i<N;i++)
{
if(flag[i] != -1)
{
pindex[pcount1] = i;
++pcount1;
}
}
How will I implement this in CUDA?
I can use atomicAdd() to calculate total number of times my condition is satisfied. But, how do I write indices in a different array. For example, I tried the following:
__global__ void kernel_tryatomic(int N,int* pcount,int* flag, int* pindex)
{
int tId=threadIdx.x;
int n=(blockIdx.x*2+blockIdx.y)*BlockSize+tId;
if(n > N-1) return;
if(flag[n] != -1)
{
atomicAdd(pcount,1);
atomicExch(&pindex[*pcount],n);
//pindex[*pcount] = n;
}
}
This code calculates "pcount" correctly, but does not update "pindex" array.
I need help to do this operation on GPUs.
Thanks
Since your condition (flag) is conceptually a binary, you can use binary prefix sum (thoroughly explained here) to determine which place the thread with a positive flag should write.
For example if N is 20, with the help of below __device__ functions:
__device__ int lanemask_lt(int lane) {
return (1 << (lane)) − 1;
}
__device__ int warp_prefix_sums(int lane, int p) {
const int mask = lanemask_lt( lane );
int b = __ballot( p );
return __popc( b & mask );
}
your __global__ function can simply be written like below:
__global__ void kernel_scan(int N,int* pcount,int* flag, int* pindex)
{
int tId=threadIdx.x;
if(tId >= N)
return;
int threadFlag = ( flag[tId] == -1 ) ? 0 : 1;
int position_to_write = warp_prefix_sum( tId & (warpSize-1), threadFlag );
if( threadFlag )
pindex[ position_to_write ] = tId;
}
If N is bigger than the warp size (32), you can use intra-block binary prefix sum that is explained in the provided link.
I have a problem, I think with __syncthreads();, in the following code:
__device__ void prefixSumJoin(const bool *g_idata, int *g_odata, int n)
{
__shared__ int temp[Config::bfr*Config::bfr]; // allocated on invocation
int thid = threadIdx.y*blockDim.x + threadIdx.x;
if(thid<(n>>1))
{
int offset = 1;
temp[2*thid] = (g_idata[2*thid]?1:0); // load input into shared memory
temp[2*thid+1] = (g_idata[2*thid+1]?1:0);
for (int d = n>>1; d > 0; d >>= 1) // build sum in place up the tree
{
__syncthreads();
if (thid < d)
{
int ai = offset*(2*thid+1)-1; // <-- breakpoint B
int bi = offset*(2*thid+2)-1;
temp[bi] += temp[ai];
}
offset *= 2;
}
if (thid == 0) { temp[n - 1] = 0; } // clear the last element
for (int d = 1; d < n; d *= 2) // traverse down tree & build scan
{
offset >>= 1;
__syncthreads();
if (thid < d)
{
int ai = offset*(2*thid+1)-1;
int bi = offset*(2*thid+2)-1;
int t = temp[ai];
temp[ai] = temp[bi];
temp[bi] += t;
}
}
__syncthreads();
g_odata[2*thid] = temp[2*thid]; // write results to device memory
g_odata[2*thid+1] = temp[2*thid+1];
}
}
__global__ void selectKernel3(...)
{
int tidx = threadIdx.x;
int tidy = threadIdx.y;
int bidx = blockIdx.x;
int bidy = blockIdx.y;
int tid = tidy*blockDim.x + tidx;
int bid = bidy*gridDim.x+bidx;
int noOfRows1 = ...;
int noOfRows2 = ...;
__shared__ bool isRecordSelected[Config::bfr*Config::bfr];
__shared__ int selectedRecordsOffset[Config::bfr*Config::bfr];
isRecordSelected[tid] = false;
selectedRecordsOffset[tid] = 0;
__syncthreads();
if(tidx<noOfRows1 && tidy<noOfRows2)
if(... == ...)
isRecordSelected[tid] = true;
__syncthreads();
prefixSumJoin(isRecordSelected,selectedRecordsOffset,Config::bfr*Config::bfr); // <-- breakpoint A
__syncthreads();
if(isRecordSelected[tid]==true){
{
some_instruction;// <-- breakpoint C
...
}
}
}
...
f(){
dim3 dimGrid(13, 5);
dim3 dimBlock(Config::bfr, Config::bfr);
selectKernel3<<<dimGrid, dimBlock>>>(...)
}
//other file
class Config
{
public:
static const int bfr = 16; // blocking factor = number of rows per block
public:
Config(void);
~Config(void);
};
The prefixSum is from GPU Gems 3: Parallel Prefix Sum (Scan) with CUDA, with little change.
Ok, now I set 3 breakpoints: A, B, C. They should be hit in the order A, B, C. The problem is that they are hit in the order: A, B*x, C, B. So at point C, selectedRecordsOffset is not ready and it causes errors. After A the B is hit a few times, but not all and then C is hit and it goes further in the code and then B again for the rest of the loop. x is different depending on the input (for some inputs there isn't any inversion in the breakpoints so C is the last that was hit).
Moreover if I look on thread numbers that cause a hit it is for A and C threadIdx.y = 0 and for B threadIdx.y = 10. How is this possible while it is the same block so why some threads omit sync? There is no conditional sync.
Does anyone have any ideas on where to look for bugs?
If you need some more clarification, just ask.
Thanks in advance for any advice on how to work this out.
Adam
Thou shalt not use __syncthreads() in conditional code if the condition does not evaluate uniformly across all threads of each block.
Considering an array a[i], i=0,1,...,g, where g could be any given number, and a[0]=1.
for a[1]=a[0]+1 to 1 do
for a[2]=a[1]+1 to 3 do
for a[3]=a[2]+1 to 5 do
...
for a[g]=a[g-1]+1 to 2g-1 do
#print a[1],a[2],...a[g]#
The problem is that everytime we change the value of g, we need to modify the code, those loops above. This is not a good code.
Recursion is one way to solve this(although I was love to see an iterative solution).
!!! Warning, untested code below !!!
template<typename A, unsigned int Size>
void recurse(A (&arr)[Size],int level, int g)
{
if (level > g)
{
// I am at the bottom level, do stuff here
return;
}
for (arr[level] = arr[level-1]+1; arr[level] < 2 * level -1; arr[level]++)
{
recurse(copy,level+1,g);
}
}
Then call with recurse(arr,1,g);
Imagine you are representing numbers with an array of digits. For example, 682 would be [6,8,2].
If you wanted to count from 0 to 999 you could write:
for (int n[0] = 0; n[0] <= 9; ++n[0])
for (int n[1] = 0; n[1] <= 9; ++n[1])
for (int n[2] = 0; n[2] <= 9; ++n[2])
// Do something with three digit number n here
But when you want to count to 9999 you need an extra for loop.
Instead, you use the procedure for adding 1 to a number: increment the final digit, if it overflows move to the preceding digit and so on. Your loop is complete when the first digit overflows. This handles numbers with any number of digits.
You need an analogous procedure to "add 1" to your loop variables.
Increment the final "digit", that is a[g]. If it overflows (i.e. exceeds 2g-1) then move on to the next most-significant "digit" (a[g-1]) and repeat. A slight complication compared to doing this with numbers is that having gone back through the array as values overflow, you then need to go forward to reset the overflowed digits to their new base values (which depend on the values to the left).
The following C# code implements both methods and prints the arrays to the console.
static void Print(int[] a, int n, ref int count)
{
++count;
Console.Write("{0} ", count);
for (int i = 0; i <= n; ++i)
{
Console.Write("{0} ", a[i]);
}
Console.WriteLine();
}
private static void InitialiseRight(int[] a, int startIndex, int g)
{
for (int i = startIndex; i <= g; ++i)
a[i] = a[i - 1] + 1;
}
static void Main(string[] args)
{
const int g = 5;
// Old method
int count = 0;
int[] a = new int[g + 1];
a[0] = 1;
for (a[1] = a[0] + 1; a[1] <= 2; ++a[1])
for (a[2] = a[1] + 1; a[2] <= 3; ++a[2])
for (a[3] = a[2] + 1; a[3] <= 5; ++a[3])
for (a[4] = a[3] + 1; a[4] <= 7; ++a[4])
for (a[5] = a[4] + 1; a[5] <= 9; ++a[5])
Print(a, g, ref count);
Console.WriteLine();
count = 0;
// New method
// Initialise array
a[0] = 1;
InitialiseRight(a, 1, g);
int index = g;
// Loop until all "digits" have overflowed
while (index != 0)
{
// Do processing here
Print(a, g, ref count);
// "Add one" to array
index = g;
bool carry = true;
while ((index > 0) && carry)
{
carry = false;
++a[index];
if (a[index] > 2 * index - 1)
{
--index;
carry = true;
}
}
// Re-initialise digits that overflowed.
if (index != g)
InitialiseRight(a, index + 1, g);
}
}
I'd say you don't want nested loops in the first place. Instead, you just want to call a suitable function, taking the current nesting level, the maximum nesting level (i.e. g), the start of the loop, and whatever if needs as context for the computation as arguments:
void process(int level, int g, int start, T& context) {
if (level != g) {
for (int a(start + 1), end(2 * level - 1); a < end; ++a) {
process(level + 1, g, a, context);
}
}
else {
computation goes here
}
}