storing and using numbers with more than 32 digits in java ! - binary

I wanted to ask about , how can I use numbers with more than 32 digits in this code , the code is supposed to multiply two binary numbers with more than 32 digits , and even long wont work , and I dont know how should I use BigInteger in this code ! can anyone help , thanks
public static void main(String [] args)
{
long a , b ;
Scanner scanner = new Scanner (System.in);
a = scanner.nextLong();
b = scanner.nextLong() ;
long sumA = 0 ;
long sumB = 0 ;
double i = 0;
while ( a != 0 || b != 0)
{
sumA += (a % 10) * Math.pow( 2.0 , i ) ;
sumB += (b % 10) * Math.pow( 2.0 , i ) ;
a /= 10 ;
b /= 10 ;
i++ ;
}
a = sumA ;
b = sumB ;
long c = a * b ;
long temp = 0 ;
for (int k = 0 ; c!=0 ; k++)
{
temp +=( Math.pow(10.0, k) * (c % 2) );
c /= 2 ;
}
System.out.println(temp) ;
}

You want http://download.oracle.com/javase/6/docs/api/java/math/BigInteger.html.

Related

Binary division with decimals

I wrote this small program to divide two 8 bit numbers with decimals (4 bit before and after comma):
void main()
{
// All numbers are in the format xxxx,xxxx
unsigned char a = 0b00010000; // Dividend = 1,0
unsigned char b = 0b00100000; // Divisor = 2,0
unsigned char r = 0; // Result
// Align divisor to the left
while ((b & 0b10000000) == 0)
{
b = (b << 1) & 0b11111110;
}
// Calculate all 8 bits
for (unsigned char i = 0; i < 8; ++i)
{
if (a < b)
{
// Append 0 to the result
r = (r << 1) & 0b11111110;
}
else
{
// Append 1 to the result
r = (r << 1) | 1;
a = a - b;
}
b = b >> 1;
}
printBinary(r);
getchar();
}
But all my results are shift to the left one digit too much.
So my results are 2 times bigger than they should.
I am so dump, even if I try to calculate 1 / 2 per hand I am making the same mistake:
0001,0000 / 0010,0000 = 0001,0000
0001,0000 / 1000,0000
-1000,0000 -> 0
0001,0000 / 0100,0000
-0100,0000 -> 0
0001,0000 / 0010,0000
-0010,0000 -> 0
0001,0000 / 0001,0000
-0001,0000 -> 1
0000,0000 / 0000,1000
-0000,1000 -> 0
0000,0000 / 0000,0100
-0000,0100 -> 0
0000,0000 / 0000,0010
-0000,0010 -> 0
0000,0000 / 0000,0001
-0000,0001 -> 0
Whats my mistake?

Miller–Rabin SPOJ WA

I am trying to implement Miller-Rabin for the first time. My code is giving correct answer for all the testcases, i tried but still on SPOJ it is giving wrong answer.
Problem Statement: I am supposed to print "YES" if entered number is prime otherwise "NO"
Please help:
Problem Link: http://www.spoj.com/problems/PON/
CODE:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
#define LL long long
LL expo(LL a,LL b,LL c)
{
LL x=1,y=a;
if(b==0)
return 1;
while(b)
{
if(b%2==1)
x=(x*y)%c;
y=(y*y)%c;
b=b/2;
}
return x;
}
int main()
{
LL t,s,x,a,n,prime,temp;
scanf("%lld",&t);
srand(time(NULL));
while(t--)
{
scanf("%lld",&n);
if(n<2)
puts("NO");
else if(n==2)
puts("YES");
else if(n%2==0)
puts("NO");
else
{
s=n-1;
prime=1;
while(s%2==0)
s=s/2;
for(int i=0;i<20;i++)
{
a=rand()%(n-1)+1;
x=expo(a,s,n);
temp=s;
while((temp!=n-1)&&(x!=1)&&(x!=n-1))
{
x=(x*x)%n;
temp*=2;
}
if((x!=n-1)&&(temp%2==0))
{
prime=0;
break;
}
}
if(prime==0)
puts("NO");
else
puts("YES");
}
}
return 0;
}
Keep in mind that puts appends a newline character '\n' to the string that you're giving. You can try with printf instead.
I think your calculation of s and d is incorrect:
function isStrongPseudoprime(n, a)
d := n - 1; s := 0
while d % 2 == 0
d := d / 2; s := s + 1
t := powerMod(a, d, n)
if t == 1 return ProbablyPrime
while s > 0
if t == n - 1 return ProbablyPrime
t := (t * t) % n
s := s - 1
return Composite
I discuss the Miller-Rabin method in an essay at my blog.
You are getting wrong answer because of integer overflow as you are multiplying 2 long number which can't be holded in a single long long type.
Here is a solution in python to overcome the issue
import random
_mrpt_num_trials = 25 # number of bases to test
def is_probable_prime(n):
assert n >= 2
# special case 2
if n == 2:
return True
# ensure n is odd
if n % 2 == 0:
return False
# write n-1 as 2**s * d
# repeatedly try to divide n-1 by 2
s = 0
d = n - 1
while True:
quotient, remainder = divmod(d, 2)
if remainder == 1:
break
s += 1
d = quotient
assert(2 ** s * d == n - 1)
def try_composite(a):
if pow(a, d, n) == 1:
return False
for i in range(s):
if pow(a, 2 ** i * d, n) == n - 1:
return False
return True
for _ in range(_mrpt_num_trials):
a = random.randrange(2, n)
if try_composite(a):
return False
return True
for i in range(int(input())):
a = int(input())
if is_probable_prime(a):
print("YES")
else:
print("NO")

Multi GPU performance degrade when allocated memory increases

I've tested the following on a GTX 690 GPU with 4GB RAM in Windows 7 x64, Visual C++ 10:
I've written a function that receives 2 vectors and adds into a 3rd vector. The task is broken over 2 GPU devices. I gradually increased the vector size to benchmark GPU performance. The required time linearly increases relative to vector size up to a certain point and then it abruptly jumps up. When I disable each of the GPU cores, the required time stays linear to the end of available memory. I've enclosed a diagram displaying required time versus allocated memory.
You can see the speed diagram here: Speed Comparison Diagram!
Can you tell me what is wrong?
Bests,
Ramin
This is my code:
unsigned BenchMark( unsigned VectorSize )
{
unsigned * D[ 2 ][ 3 ] ;
for ( int i = 0 ; i < 2 ; i++ )
{
cudaSetDevice( i ) ;
for ( int j = 0 ; j < 3 ; j++ )
cudaMalloc( & D[ i ][ j ] , VectorSize * sizeof( unsigned ) ) ;
}
unsigned uStartTime = clock() ;
// TEST
for ( int i = 0 ; i < 2 ; i++ )
{
cudaSetDevice( i ) ;
AddKernel<<<VectorSize/256,256>>>(
D[ i ][ 0 ] ,
D[ i ][ 1 ] ,
D[ i ][ 2 ] ,
VectorSize ) ;
}
cudaDeviceSynchronize() ;
cudaSetDevice( 0 ) ;
cudaDeviceSynchronize() ;
unsigned uEndTime = clock() ;
for ( int i = 0 ; i < 2 ; i++ )
{
cudaSetDevice( i ) ;
for ( int j = 0 ; j < 3 ; j++ )
cudaFree( D[ i ][ j ] ) ;
}
return uEndTime - uStartTime ;
}
__global__ void AddKernel(
const Npp32u * __restrict__ pSource1 ,
const Npp32u * __restrict__ pSource2 ,
Npp32u * __restrict__ pDestination ,
unsigned uLength )
{
unsigned x = blockIdx.x * blockDim.x + threadIdx.x ;
if ( x < uLength )
pDestination[ x ] = pSource1[ x ] + pSource2[ x ] ;
}
I found the answer. The problem happened as SLI was active, I disabled it and now it is working smoothly.

Distribute the threads between blocks in CUDA

I'm working on a project in CUDA. The first time I used only one block with Dim 8*8 as my matrix. And then I calculated the index as follows:
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int idy = blockIdx.y * blockDim.y + threadIdx.y;
And it gave me a correct answer. After that I want to distribute the threads between blocks to measure the performance. I make the grid dim to be (2,1) and the block dim to be (4,8).
When I debug the code by hand, it seems to give me the correct index without changing the formula mentioned above. But when I run the program, the screen hangs and the results are all zero.
What did I do wrong, and how can I fix this?
This is the kernel function
__global__ void cover_fault(int *a,int *b, int *c, int *d, int *mulFV1, int *mulFV2, int *checkDalU1, int *checkDalU2, int N)
{
//Fig.2
__shared__ int f[9][9];
__shared__ int compV1[9],compV2[9];
int dalU1[9] , dalU2[9];
int Ra=2 , Ca=2;
for (int i = 0 ; i < N ; i++)
for (int j = 0 ; j < N ; j++)
f[i][j]=0;
f[3][0] = 1;
f[0][2] = 1;
f[0][6] = 1;
f[3][7] = 1;
f[2][4] = 1;
f[6][4] = 1;
f[7][1] = 1;
int t =0 ,A = 1,B = 1 , UTP = 5 , LTP = -5 , U_max = 40 , U_min = -160;
bool flag = true;
int sumV1, sumV2;
int checkZero1 , checkZero2;
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int idy = blockIdx.y * blockDim.y + threadIdx.y;
while ( flag == true)
{
if ( c[idy] == 0 )
compV1[idy] = 1;
else if ( c[idy]==1)
compV1[idy] = 0 ;
if ( d[idy] == 0 )
compV2[idy] = 1;
else if ( d[idy]==1 )
compV2[idy] = 0 ;
sumV1 = reduce ( c, N );
sumV2 = reduce ( d, N );
if (idx<N && idy <N)
{
if(idx==0)
mulFV1[idy]=0;
if(idy==0)
mulFV2[idx]=0;
__syncthreads();
atomicAdd(&(mulFV1[idy]),f[idy][idx]*compV2[idx]);
atomicAdd(&(mulFV2[idx]),f[idy][idx]*compV1[idy]);
}
dalU1[idy] = ( -1*A*( sumV1 - Ra )) + (B * mulFV1[idy] * compV1[idy]) ;
dalU2[idy] = ( -1*A*( sumV2 - Ca )) + (B * mulFV2[idy] * compV2[idy]) ;
a[idy] = a[idy] + dalU1[idy];
b[idy] = b[idy] + dalU2[idy];
if ( a[idy] > U_max )
a[idy] = U_max;
else
if (a[idy] < U_min )
a[idy] = U_min;
if ( b[idy] > U_max )
b[idy] = U_max;
else
if (b[idy] < U_min )
b[idy] = U_min;
if (dalU1[idy]==0)
checkDalU1[idy]=0;
else
checkDalU1[idy]=1;
if (dalU2[idy]==0)
checkDalU2[idy]=0;
else
checkDalU2[idy]=1;
__syncthreads();
checkZero1 = reduce(checkDalU1,N);
checkZero2 = reduce(checkDalU2,N);
if ( checkZero1==0 && checkZero2==0)
flag = false;
else
{
if ( a[idy] > UTP )
c[idy] = 1;
else
if ( a[idy] < LTP )
c[idy] = 0 ;
if ( b[idy] > UTP )
d[idy] = 1;
else
if ( b[idy] < LTP )
d[idy] = 0 ;
t++;
}//end else
sumV1=0;
sumV2=0;
mulFV1[idy]=0;
mulFV2[idy]=0;
} //end while
}//end function
In your index computation, idx will give you the column index and idy the row index. Are you accessing your matrix as M[idy][idx]?
The cuda threads are organized according to the orthogonal system: X is horizontal and Y is vertical. So if you say the point M[0][1] in the actual matrix it's M[1][0].

Given a 16-bit integer, compare each of the four bits and output the maximum one

. e.g. 0xFE10, should output 0xF(1111 in binary).
This is a Qualcomm interview question. This is my idea so far:
I am calling the 16-bit integer:
int num = /*whatever the number is*/
Have four bit masks:
int zeroTo4 = (num & 0x000F);
int fiveTo5 = (num & 0x00F0) >> 4;
int eightTo12 = (num & 0x0F00) >> 8;
int twelveTo16 = (num & 0xF000) >> 12;
int printbit = zeroTo4;
if( fiveTo5 > printbit )
printbit = fiveTo5;
if( eightTo12 > printbit )
printbit = eightTo12;
if( twelveTo16 > printbit )
printbit = twelveTo16;
printf( "Largest bit of %X is %1X\n", num, printbit );
However, I'm pretty sure there's a simpler and easier way to do this. Can anyone help me out? Thanks!
int max4(int j)
{
int ret=0;
while(j>0)
{
if( (j&0xf) > ret ) ret=j&0xf;
j>>=4;
}
return ret;
}
Some may prefer:
int max4(int j)
{
int ret=0;
do if( (j&0xf) > ret ) ret=j&0xf;
while((j>>=4)>0);
return ret;
}