Accessing first and second values of a pair in a set - stl

#include <bits/stdc++.h>
using namespace std;
#define fast ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0);
#define LL long long int
#define pb push_back
#define mp make_pair
#define PII pair<int,int>
#define PLL pair<LL,LL>
#define PIS pair< int,string>
#define test int t;cin>>t;while(t--)
#define ff first // error: 'std::set<std::pair<long long int, long long int> >::iterator' has no member named 'first'
#define ss second // error: 'std::set<std::pair<long long int, long long int> >::iterator' has no member named 'second'
#define INF 1000000000
#define input(a,n) for(i=1;i<=n;i++)cin>>a[i];
#define output(a,n) for(i=1;i<=n;i++)cout<<a[i]<<" ";
vector< vector<LL> > v(3002, vector<LL>(3002,-1));
set< pair<LL, LL> > se;
set< pair<LL, LL> >::iterator it;
int vis[3002]={0};
void exmin(LL a)
{
LL x,des,val,min=INF;
for(x=0;x<v[a].size();x++)
{
if(v[a][x]<min)
{
val=v[a][x];
des=x;
min=val;
}
}
se.insert(mp(val,des));
}
int main() {
fast
LL n,m,x,i,j,k,wt=0,s;
cin>>n>>m;
vector<int> ve;
for(x=1;x<=n;x++)
ve.pb(x);
for(x=0;x<m;x++)
{
cin>>i>>j>>k;
if(v[i][j]!=-1)
{
if(v[i][j]>k)
{
v[i][j]=k;
v[j][i]=k;
}
}
else
{
v[i][j]=k;
v[j][i]=k;
}
}
cin>>s;
ve.erase(ve.begin()+s-1);
while(ve.size()!=0)
{
for(x=0;x<v[s].size();x++)
{
if(v[s][x]!=-1 && vis[x]!=1)
{
exmin(x);
}
}
/* for(x=0;x<p.size();x++)
{
}*/
it=se.begin();
wt=wt+*(it).ff;
s=*(it).ss;
vis[*(it).ss]=1;
ve.erase(ve.begin()+*(it).ss-1);
se.erase(it);
}
return 0;
}
Still facing errors.
I am trying to implement Prim's algorithm.
Was not able to include the line numbers, hence attached the errors along with the lines itself.
Sorry , but could not help including the "abhorrent part" because the error is in that part.
Edit:
Found my mistake , it was a syntactical error.

Although you have a global set s
set< pair<LL, LL> > s;
You have also defined a local variable s of type long long:
LL n,m,x,i,j,k,wt=0,s;
which hides the global s. Obviously a long long has neither begin nor erase member functions since it is a primitive type. Hence the errors:
it=s.begin(); //error: request for member 'begin' in 's', which is of non-class type 'long long int'
s.erase(it); //error: request for member 'erase' in 's', which is of non-class type 'long long int'
To refer to the global s, use ::s, i.e.
::s.erase(it);
And lastly I want to point out that this "contestese" coding style you're using is abhorrent. Feel free to use it during contests as much as you like, but please edit it away when you post questions on SO.

Found the error in my code.
wt=wt+*(it).ff;
instead of *(it) it should have been (*it)

Related

How to return string from __global__ function to main function in C CUDA [duplicate]

I am trying to add 2 char arrays in cuda, but nothing is working.
I tried to use:
char temp[32];
strcpy(temp, my_array);
strcat(temp, my_array_2);
When I used this in kernel - I am getting error : calling a __host__ function("strcpy") from a __global__ function("Process") is not allowed
After this, I tried to use these functions in host, not in kernel - no error,but after addition I am getting strange symbols like ĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶĶ.
So, how I can add two ( or more ) char arrays in CUDA ?
So, how I can add two ( or more ) char arrays in CUDA ?
write your own functions:
__device__ char * my_strcpy(char *dest, const char *src){
int i = 0;
do {
dest[i] = src[i];}
while (src[i++] != 0);
return dest;
}
__device__ char * my_strcat(char *dest, const char *src){
int i = 0;
while (dest[i] != 0) i++;
my_strcpy(dest+i, src);
return dest;
}
And while we're at it, here is strcmp
As the error message explains, you are trying to call host functions ("CPU functions") from a global kernel ("GPU function"). Within a global kernel you only have access to functions provided by the CUDA runtime API, which doesn't include the C standard library (where strcpy and strcat are defined).
You have to create your own str* functions according to what you want to do. Do you want to concatenate an array of chars in parallel, or do it serially in each thread?

type casting to unsigned long long in CUDA?

Basically what I want is an function works like hiloint2uint64(), just join two 32 bit integer and reinterpret the outcome as an uint64.
I cannot find any function in CUDA that can do this, anyhow, is there any ptx code that can do that kind of type casting?
You can define your own function like this:
__host__ __device__ unsigned long long int hiloint2uint64(int h, int l)
{
int combined[] = { h, l };
return *reinterpret_cast<unsigned long long int*>(combined);
}
Maybe a bit late by now, but probably the safest way to do this is to do it "manually" with bit-shifts and or:
uint32_t ui_h = h;
uint32_t ui_l = l;
return (uint64_t(h)<<32)|(uint64_t(l));
Note the other solution presented in the other answer isn't safe, because the array of ints might not be 8-byte aligned (and shifting some bits is faster than memory read/write, anyway)
Use uint2 (but define the temporary variable as 64-bit value: unsigned long long int) instead of arrays to be sure of alignment.
Be careful about the order of l and h.
__host__ __device__ __forceinline__ unsigned long long int hiloint2uint64(unsigned int h, unsigned int l)
{
unsigned long long int result;
uint2& src = *reinterpret_cast<uint2*>(&result);
src.x = l;
src.y = h;
return result;
}
The CUDA registers have a size of 32 bits anyway. In the best case the compiler won't need any extra code. In the worst case it has to reorder the registers by moving a 32-bit value.
Godbolt example https://godbolt.org/z/3r9WYK9e7 of how optimized it gets.

unsigned long long to binary

I am trying to check the set bits of an unsigned long long in c++ using below algorithm which only checks whether the bit is set or not.But my problem is the answer that I get is wrong.Please help me understand how unsigned long long is stored in binary.
Code:
#include<stdio.h>
#include<iostream>
#define CHECK_BIT(var,pos) ((var) & (1<<(pos)))
using namespace std;
int main()
{
int pos=sizeof(unsigned long long)*8;
unsigned long long a;
cin>>a;
pos=pos-1;
while(pos>=0)
{
if(CHECK_BIT(a,pos))
cout<<"1";
else
cout<<"0";
--pos;
}
}
Input :
1000000000000000000
Output:
1010011101100100000000000000000010100111011001000000000000000000
Expected Output:
110111100000101101101011001110100111011001000000000000000000
Similarly for another input:
14141
Output :
0000000000000000001101110011110100000000000000000011011100111101
Expected Output:
11011100111101
In the second example(in fact for any small number) the binary pattern just repeats itself after 32 bits.
I think what you have is an issue in the bit set macro , please replace it w/
#define CHECK_BIT(var,pos) ((var) & (1LL<<(pos)))

Function return not returning a value to main

So, this is my first assignment fiddling with functions in C++ - which I thought I understood being as they're rather similar to methods with C#. But, although my main calls my function fine, and the function runs and returns to the main - It doesn't send back the variable information that it called when it returns. I'm not exactly sure what I've done wrong - it appears to be set up appropriately (IE - Like the sample code in my book for a return)
Here's the main code...
#include <iostream>
#include <cmath>
#include <string>
using namespace std;
int main()
{
double retrieveSales=0, sales1=0, sales2=0, sales3=0, sales4=0;
string division, division2, division3, division4;
double getSales(string);
cout<<"Enter division.\n";
cin>>division;
getSales(division);
retrieveSales=sales1;
cout<<"Enter second division.\n";
cin>>division2;
getSales(division2);
retrieveSales=sales2;
cout<<"Print Sales"<<sales2;
cout<<"Enter third division.\n";
cin>>division3;
getSales(division3);
retrieveSales=sales3;
cout<<"Print Sales"<<sales3;
cout<<"Enter fourth division.\n";
cin>>division4;
getSales(division4);
retrieveSales=sales4;
cout<<"Print Sales"<<sales4;
system("pause");
return 0;
}
And here's the code for the function that it calls
#include <iostream>
#include <cmath>
#include <string>
using namespace std;
double getSales(string division)
{
double retrieveSales;
cout<<"What are the sales for "<<division<<endl;
cin>>retrieveSales;
while(retrieveSales<0.0)
{
cout<<"Please enter a valid sales amount no less than $0.00.\n";
cin>>retrieveSales;
}
system("pause");
return retrieveSales;
}
How do I get my function to return the value of retrieveSales to retrieveSales in the main?
In this line:
getSales(division);
You are discarding the return value, you need to assign it to a variable:
sales1 = getSales(division);

Thrust Complex Transform of 3 different size vectors

Hello I have this loop in C+, and I was trying to convert it to thrust but without getting the same results...
Any ideas?
thank you
C++ Code
for (i=0;i<n;i++)
for (j=0;j<n;j++)
values[i]=values[i]+(binv[i*n+j]*d[j]);
Thrust Code
thrust::fill(values.begin(), values.end(), 0);
thrust::transform(make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
binv.begin(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))))),
make_zip_iterator(make_tuple(
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))) + n,
binv.end(),
thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))) + n)),
thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
function1()
);
Thrust Functions
struct IndexDivFunctor: thrust::unary_function<int, int>
{
int n;
IndexDivFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx / n;
}
};
struct IndexModFunctor: thrust::unary_function<int, int>
{
int n;
IndexModFunctor(int n_) : n(n_) {}
__host__ __device__
int operator()(int idx)
{
return idx % n;
}
};
struct function1
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) + thrust::get<1>(v) * thrust::get<2>(v);
}
};
To begin with, some general comments. Your loop
for (i=0;i<n;i++)
for (j=0;j<n;j++)
v[i]=v[i]+(B[i*n+j]*d[j]);
is the equivalent of the standard BLAS gemv operation
where the matrix is stored in row major order. The optimal way to do this on the device would be using CUBLAS, not something constructed out of thrust primitives.
Having said that, there is absolutely no way the thrust code you posted is ever going to do what your serial code does. The errors you are seeing are not as a result of floating point associativity. Fundamentally thrust::transform applies the functor supplied to every element of the input iterator and stores the result on the output iterator. To yield the same result as the loop you posted, the thrust::transform call would need to perform (n*n) operations of the fmad functor you posted. Clearly it does not. Further, there is no guarantee that thrust::transform would perform the summation/reduction operation in a fashion that would be safe from memory races.
The correct solution is probably going to be something like:
Use thrust::transform to compute the (n*n) products of the elements of B and d
Use thrust::reduce_by_key to reduce the products into partial sums, yielding Bd
Use thrust::transform to add the resulting matrix-vector product to v to yield the final result.
In code, firstly define a functor like this:
struct functor
{
template <typename Tuple>
__host__ __device__
double operator()(Tuple v)
{
return thrust::get<0>(v) * thrust::get<1>(v);
}
};
Then do the following to compute the matrix-vector multiplication
typedef thrust::device_vector<int> iVec;
typedef thrust::device_vector<double> dVec;
typedef thrust::counting_iterator<int> countIt;
typedef thrust::transform_iterator<IndexDivFunctor, countIt> columnIt;
typedef thrust::transform_iterator<IndexModFunctor, countIt> rowIt;
// Assuming the following allocations on the device
dVec B(n*n), v(n), d(n);
// transformation iterators mapping to vector rows and columns
columnIt cv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n));
columnIt cv_end = cv_begin + (n*n);
rowIt rv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n));
rowIt rv_end = rv_begin + (n*n);
dVec temp(n*n);
thrust::transform(make_zip_iterator(
make_tuple(
B.begin(),
thrust::make_permutation_iterator(d.begin(),rv_begin) ) ),
make_zip_iterator(
make_tuple(
B.end(),
thrust::make_permutation_iterator(d.end(),rv_end) ) ),
temp.begin(),
functor());
iVec outkey(n);
dVec Bd(n);
thrust::reduce_by_key(cv_begin, cv_end, temp.begin(), outkey.begin(), Bd.begin());
thrust::transform(v.begin(), v.end(), Bd.begin(), v.begin(), thrust::plus<double>());
Of course, this is a terribly inefficient way to do the computation compared to using a purpose designed matrix-vector multiplication code like dgemv from CUBLAS.
How much your results differ? Is it a completely different answer, or differs only on the last digits? Is the loop executed only once, or is it some kind of iterative process?
Floating point operations, especially those that repetedly add up or multiply certain values, are not associative, because of precision issues. Moreover, if you use fast-math optimisations, the operations may not be IEEE compilant.
For starters, check out this wikipedia section on floating-point numbers: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems