Need a formula to get total LUN size using lunSizeLow and lunSizeHigh SNMP objects - binary

I have 2 SNMP Objects/OIDs. Below are the details:
Object1:
Name: lunSizeLow
OID: 1.3.6.1.4.1.43906.1.4.3.2.3.1.9
Description: `LUN` size in bytes - low order bytes
Object2:
Name: lunSizeHigh
OID: 1.3.6.1.4.1.43906.1.4.3.2.3.1.10
Description: `LUN` size in bytes - high order bytes
My requirement:
I want to monitor LUN size through some script. But i didn't found any SNMP object, which can give total LUN size directly. I found 2 separate objects (lunSizeLow and lunSizeHigh) to get LUN total size, so i need a formula to get total LUN size using these 2 low order and high order SNMP objects (lunSizeLow and lunSizeHigh).
I gone through many articles over internet and i found couple of formulas in community.hpe.com.
But I'm not sure which one is correct.
Formula 1:
Max unsigned number that can be stored in 32bits counter is 4294967295.
Total size would be: LOW_ORDER_BYTES + HIGH_ORDER_BYTES * 4294967296
Formula 2:
Total size in GB is LOW_ORDER_BYTES / 1073741824 + HIGH_ORDER_BYTES * 4
Could any one help me to get correct formula.

Most languages will have the bit-shift operator, allowin you to do something similar to the below (pseudo-Java):
long myBigInteger = lunSizeHigh
myBigInteger << 32 # Shifts the high bits 32 positions to the left, into the high half of the long
myBigInteger = myBigInteger + lunSizeLow
This has two advantages over multiplying:
Bit shifting is often faster than multiplication, even though most compilers would optimize that particular multiplication into a bit shift anyway.
It is easier to read the code and understand why this would provide the correct answer, given the description from the MIB. Magic numbers should be avoided where possible.
That aside, putting some numbers into the Windows Calculator (using Programmer Mode) and trying formula 1, we can see that it works.
Now, you don't specify what language or environment you're working in, and in some languages you won't have any number type that supports the size of numbers you want to manipulate. (Same reason that this number had to be split into two counters to begin with - it's larger than the largest number representation available on some (primitive) platforms.) If you want to do it using multiplication, you'll have to make sure your implementation language can do better.

Related

Minimum swaps to create an alternating binary string of ones and zeros

Given a binary string, that is it contains only 0s and 1s (number of zeros equals the number of ones) We need to make this string a sequence of alternate characters by swapping some of the bits, our goal is to minimize the number swaps.
For example, for the string "00011011" the minimum number of swaps is 2, one way to do it is:
1) swap the bits : 00011011 --->> 00010111
2) swap the bits(after the first swap) : 00010111 --->> 01010101
Note that if we are given the string "00101011" we can turn it into an alternate string starting with 0 (that requires 3 swaps) and also into alternate string starting with 1 ( that requires one swap - the first and the last bits ).
So the minimum in this case is one swap.
The end goal is to return the minimum number of swaps for a given string of ones and zeros.
What is the most efficient way to solve it?
Trivial stuff is trivial.
xor your sequence with 010101010…. Note: if the sequence was already "correct", you'd get an all-0 bit stream (also do with the inverse if you don't care whether you start with 0 or 1). XOR is very efficient on modern CPUs. I can do 512 bits at a time on my Xeon, and I think it takes 2 cycles.
count the ones (#ones). On modern x87, and a lot of other ISAs, there's a POPCNT instruction that makes that extremely efficient. Throughput one, 64 bits at a time on my CPU.
swaps necessary is #ones/2 (obvious), or #zeros/2, whichever is less.
Note: Code published on this site is CC-by-SA. That means that you're legally required to state the URL of this answer and my username when you use this in your software, or your homework, or your exam, or your assignment.

Compressing a binary matrix

We were asked to find a way to compress a square binary matrix as much as possible, and if possible, to add redundancy bits to check and maybe correct errors.
The redundancy thing is easy to implement in my opinion. The complicated part is compressing the matrix. I thought about using run-length after reshaping the matrix to a vector because there will be more zeros than ones, but I only achieved a 40bits compression (we are working on small sizes) although I thought it'd be better.
Also, after run-length an idea was Huffman coding the matrix, but a dictionary must be sent in order to recover the original information.
I'd like to know what would be the best way to compress a binary matrix?
After reading some comments, yes #Adam you're right, the 14x14 matrix should be compressed in 128bits, so if I only use the coordinates (rows&cols) for each non-zero element, still it would be 160bits (since there are twenty ones). I'm not looking for an exact solution but for a useful idea.
You can only talk about compressing something if you have a distribution and a representation. That's the issue of the dictionary you have to send along: you always need some sort of dictionary of protocol to uncompress something. It just so happens that things like .zip and .mpeg already have those dictionaries/codecs. Even something as simple as Huffman-encoding is an algorithm; on the other side of the communication channel (you can think of compression as communication), the other person already has a bit of code (the dictionary) to perform the Huffman decompression scheme.
Thus you cannot even begin to talk about compressing something without first thinking "what kinds of matrices do I expect to see?", "is the data truly random, or is there order?", and if so "how can I represent the matrices to take advantage of order in the data?".
You cannot compress some matrices without increasing the size of other objects (by at least 1 bit). This is bad news if all matrices are equally probable, and you care equally about them all.
Addenda:
The answer to use sparse matrix machinery is not necessarily the right answer. The matrix could for example be represented in python as [[(r+c)%2 for c in range (cols)] for r in range(rows)] (a checkerboard pattern), and a sparse matrix wouldn't compress it at all, but the Kolmogorov complexity of the matrix is the above program's length.
Well, I know every matrix will have the same number of ones, so this is kind of deterministic. The only think I don't know is where the 1's will be. Also, if I transmit the matrix with a dictionary and there are burst errors, maybe the dictionary gets affected so... wouldnt be the resulting information corrupted? That's why I was trying to use lossless data compression such as run-length, the decoder just doesnt need a dictionary. --original poster
How many 1s does the matrix have as a fraction of its size, and what is its size (NxN -- what is N)?
Furthermore, this is an incorrect assertion and should not be used as a reason to desire run-length encoding (which still requires a program); when you transmit data over a channel, you can always add error-correction to this data. "Data" is just a blob of bits. You can transmit both the data and any required dictionaries over the channel. The error-correcting machinery does not care at all what the bits you transmit are for.
Addendum 2:
There are (14*14) choose 20 possible arrangements, which I assume are randomly chosen. If this number was larger than 128^2 what you're trying to do would be impossible. Fortunately log_2((14*14) choose 20) ~= 90bits < 128bits so it's possible.
The simple solution of writing down 20 numbers like 32,2,67,175,52,...,168 won't work because log_2(14*14)*20 ~= 153bits > 128bits. This would be equivalent to run-length encoding. We want to do something like this but we are on a very strict budget and cannot afford to be "wasteful" with bits.
Because you care about each possibility equally, your "dictionary"/"program" will simulate a giant lookup table. Matlab's sparse matrix implementation may work but is not guaranteed to work and is thus not a correct solution.
If you can create a bijection between the number range [0,2^128) and subsets of size 20, you're good to go. This corresponds to enumerating ways to descend the pyramid in http://en.wikipedia.org/wiki/Binomial_coefficient to the 20th element of row 196. This is the same as enumerating all "k-combinations". See http://en.wikipedia.org/wiki/Combination#Enumerating_k-combinations
Fortunately I know that Mathematica and Sage and other CAS software can apparently generate the "5th" or "12th" or arbitrarily numbered k-subset. Looking through their documentation, we come upon a function called "rank", e.g. http://www.sagemath.org/doc/reference/sage/combinat/subset.html
So then we do some more searching, and come across some arcane Fortran code like http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_rank.m and http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_unrank.m
We could reverse-engineer it, but it's kind of dense. But now we have enough information to search for k-subset rank unrank, which leads us to http://www.site.uottawa.ca/~lucia/courses/5165-09/GenCombObj.pdf -- see the section
"Generating k-subsets (of an n-set): Lexicographical
Ordering" and the rank and unrank algorithms on the next few pages.
In order to achieve the exact theoretically optimal compression, in the case of a uniformly random distribution of 1s, we must thus use this technique to biject our matrices to our output number of range <2^128. It just so happens that combinations have a natural ordering, known as ranking and unranking of combinations. You assign a number to each combination (ranking), and if you know the number you automatically know the combination (unranking). Googling k-subset rank unrank will probably yield other algorithms.
Thus your solution would look like this:
serialize the matrix into a list
e.g. [[0,0,1][0,1,1][1,0,0]] -> [0,0,1,0,1,1,1,0,0]
take the indices of the 1s:
e.g. [0,0,1,0,1,1,1,0,0] -> [3,5,6,7]
1 2 3 4 5 6 7 8 9 a k=4-subset of an n=9 set
take the rank
e.g. compressed = rank([3,5,6,7], n=9)
compressed==412 (or something, I made that up)
you're done!
e.g. 412 -binary-> 110011100 (at most n=9bits, less than 2^n=2^9=512)
to uncompress, unrank it
I'll get to 128 bits in a sec, first here's how you fit a 14x14 boolean matrix with exactly 20 nonzeros into 136 bits. It's based on the CSC sparse matrix format.
You have an array c with 14 4-bit counters that tell you how many nonzeros are in each column.
You have another array r with 20 4-bit row indices.
56 bits (c) + 80 bits (r) = 136 bits.
Let's squeeze 8 bits out of c:
Instead of 4-bit counters, use 2-bit. c is now 2*14 = 28 bits, but can't support more than 3 nonzeros per column. This leaves us with 128-80-28 = 20 bits. Use that space for array a4c with 5 4-bit elements that "add 4 to an element of c" specified by the 4-bit element. So, if a4c={2,2,10,15, 15} that means c[2] += 4; c[2] += 4 (again); c[10] += 4;.
The "most wasteful" distribution of nonzeros is one where the column count will require an add-4 to support 1 extra nonzero: so 5 columns with 4 nonzeros each. Luckily we have exactly 5 add-4s available.
Total space = 28 bits (c) + 20 bits
(a4c) + 80 bits (r) = 128 bits.
Your input is a perfect candidate for a sparse matrix. You said you're using Matlab, so you already have a good sparse matrix built for you.
spm = sparse(dense_matrix)
Matlab's sparse matrix implementation uses Compressed Sparse Columns, which has memory usage on the order of 2*(# of nonzeros) + (# of columns), which should be pretty good in your case of 20 nonzeros and 14 columns. Storing 20 values sure is better than storing 196...
Also remember that all matrices in Matlab are going to be composed of doubles. Just because your matrix can be stored as a 1-bit boolean doesn't mean Matlab won't stick it into a 64-bit floating point value... If you do need it as a boolean you're going to have to make your own type in C and use .mex files to interface with Matlab.
After thinking about this again, if all your matrices are going to be this small and they're all binary, then just store them as a binary vector (bitmask). Going off your 14x14 example, that requires 196 bits or 25 bytes (plus n, m if your dimensions are not constant). That same vector in Matlab would use 64 bits per element, or 1568 bytes. So storing the matrix as a bitmask takes as much space as 4 elements of the original matrix in Matlab, for a compression ratio of 62x.
Unfortunately I don't know if Matlab supports bitmasks natively or if you have to resort to .mex files. If you do get into C++ you can use STL's vector<bool> which implements a bitmask for you.

Why is the knapsack problem pseudo-polynomial?

I know that Knapsack is NP-complete while it can be solved by DP. They say that the DP solution is pseudo-polynomial, since it is exponential in the "length of input" (i.e. the numbers of bits required to encode the input). Unfortunately I did not get it. Can anybody explain that pseudo-polynomial thing to me slowly ?
The running time is O(NW) for an unbounded knapsack problem with N items and knapsack of size W. W is not polynomial in the length of the input though, which is what makes it pseudo-polynomial.
Consider W = 1,000,000,000,000. It only takes 40 bits to represent this number, so input size = 40, but the computational runtime uses the factor 1,000,000,000,000 which is O(240).
So the runtime is more accurately said to be O(N.2bits in W), which is exponential.
Also see:
How to understand the knapsack problem is NP-complete?
The NP-Completeness of Knapsack
Complexity of dynamic programming algorithm for the 0-1 knapsack problem
Pseudo-polynomial time
In most of our problems, we're dealing with large lists of numbers which fit comfortably inside standard int/float data types. Because of the way most processors are built to handle 4-8 byte numbers at a time at no additional cost (relative to numbers than fit in, say, 1 byte), we rarely encounter a change in running time from scaling our numbers up or down within ranges we encounter in real problems - so the dominant factor remains just the sheer quantity of data points, the n or m factors that we're used to.
(You can imagine that the Big-O notation is hiding a constant factor that divides-out 32 or 64 bits-per-datum, leaving only the number-of-data-points whenever each of our numbers fit in that many bits or less)
But try reworking with other algorithms to act on data sets involving big ints - numbers that require more than 8 bytes to represent - and see what that does to the runtime. The magnitude of the numbers involved always makes a difference, even in the other algorithms like binary sort, once you expand beyond the buffer of safety conventional processors give us "for-free" by handling 4-8 byte batches.
The trick with the Knapsack algorithm that we discussed is that it's unusually sensitive (relative to other algorithms ) to the magnitude of a particular parameter, W. Add one bit to W and you double the running time of the algorithm. We haven't seen that kind of dramatic response to changes in value in other algorithms before this one, which is why it might seem like we're treating Knapsack differently - but that's a genuine analysis of how it responds in a non-polynomial fashion to changes in input size.
The way I understand this is that the capacity would've been O(W) if the capacity input were an array of [1,2,...,W], which has a size of W. But the capacity input is not an array of numbers, it's instead a single integer. The time complexity is about the relationship to the size of input. The size of an integer is NOT the value of the integer, but the number of bits representing it. We do later convert this integer W into an array [1,2,...,W] in the algorithm, leading people into mistakenly thinking W is the size, but this array is not the input, the integer itself is.
Think of input as "an array of stuff", and the size as "how many stuff in the array". The item input is actually an array of n items in the array so size=n. The capacity input is NOT an array of W numbers in it, but a single integer, represented by an array of log(W) bits. Increase the size of it by 1 (adding 1 meaningful bit), W doubles so run time doubles, hence the exponential time complexity.
The Knapsack algorithm's run-time is bound not only on the size of the input (n - the number of items) but also on the magnitude of the input (W - the knapsack capacity) O(nW) which is exponential in how it is represented in computer in binary (2^n) .The computational complexity (i.e how processing is done inside a computer through bits) is only concerned with the size of the inputs, not their magnitudes/values.
Disregard the value/weight list for a moment. Let's say we have an instance with knapsack capacity 2. W would take two bits in the input data. Now we shall increase the knapsack capacity to 4, keeping the rest of the input. Our input has only grown by one bit, but the computational complexity has increased twofold. If we increase the capacity to 1024, we would have just 10 bits of the input for W instead of 2, but the complexity has increased by a factor of 512. Time complexity grows exponentially in the size of W in binary (or decimal) representation.
Another simple example that helped me understand the pseudo-polynomial concept is the naive primality testing algorithm. For a given number n we are checking if it's divided evenly by each integer number in range 2..√n, so the algorithm takes √(n−1) steps. But here, n is the magnitude of the input, not it's size.
Now The regular O(n) case
By contrast, searching an array for a given element runs in polynomial time: O(n). It takes at most n steps and here n is the size of the input (the length of the array).
[ see here ]
Calculating bits required to store decimal number
Complexity is based on input. In knapsack problem, Inputs are size, max Capacity, and profit, weight arrays. We construct dp table as size * W so we feel as its of polynomial time complexity. But, input W is an integer, not an array. So, it will be O(size*(no Of bits required to store given W)). If no of bits increase by 1, then running time doubles. Thus it is exponential, thereby pseudo-polynomial.

Trying to understand why page sizes are a power of 2?

I read this:
Recall that paging is implemented by
breaking up an address into a page and
offset number. It is most efficient to
break the address into X page bits and
Y offset bits, rather than perform
arithmetic on the address to calculate
the page number and offset. Because
each bit position represents a power
of 2, splitting an address between
bits results in a page size that is a
power of 2.
I don't quite understand this answer, can anyone give a simpler explanation?
If you are converting a (linear) address to page:offset, you want to divide the address by the page size and take the integer answer as the page, and the reminder as the offset.
This is done using integer division and modulus (MOD, "%") operators in your programming language.
A computer represents an address as a number, stored as binary bits.
Here's an example address: 12 is 1100 in binary.
If the page size is 3, then we'd need to calculate 12/3 and 12%3 to find the page and offset (4:0 respectively).
However, if the page size is 4 (a power of 2), then 4 in binary is 100, and integer division and modulus can be computed using special 'shortcuts': you can strip the last two binary digits to divide, and you can keep only the last two binary digits for modulus. So:
12/4 == 12>>2 (shifting to remove the last two digits)
12%4 == 12&(4-1) (4-1=3 is binary 11, and the '&' (AND) operator only keeps those)
Working with powers of two leads to more efficient hardware so that's what hardware designers do. Consider a cpu with a 32-bit address, and an n-bit page number:
+----------------------+--------------------+
| page number (n bits) | byte offset (32-n) |
+----------------------+--------------------+
This address is sent to the virtual memory unit, which directly separates the page number and byte offset without any arithmetic operations at all. Ie, it treats the 32-bit value as an array of bits, and has (more or less) a wire running directly to each bit. This allows the memory hardware to extract the page number and byte offset in parallel, without performing any arithmetic operations.
At the same time, this approach requires splitting on a bit boundary, which leads directly to power-of-2 page sizes.
If you have n binary digits at your disposal, then you can encode 2n different values.
Given an address, your description states some bits will be used for the page, and some for the offset. As you're using a whole number of binary bits for the for offset Y, then the page size is naturally a power of 2, specifically 2Y.
Because the representation of the data. The page offset sets the size of the page, and since the data is represented in binary, you will have a number n of bits to define the offset, so you will have pages with a size of 2^n.
Let's suppose you have an address like this 10011001, and you split it in 1001:1001 for page:offset. Since you have 4 bits to define the offset, your page's size is 2⁴.
If it is not exact power of 2 then some memory address will be invalid. eg if the page size is 5 byte then to distinguish each byte we need 3 bit in the offset part of the address. Because using 2 bit only 4 byte can be addressed. But using 3 bit offset for 5 byte page keep two address unused. thats why page size should be .......
since all addresses are in binary and are divided into f and d, f=frame no, d=offset. due to be the size in power of 2 a human no need to go for huge mathematical calculation, just by watching the address bits you can identify the f and d. If page size was 2^n the in physical address last n bits represents the offset and remaining bits represent page numbers.
Hopefully this answer will help you.
I realized what I didn't get was how the page sizes are a power of 2. I found out afterwards:
At the smallest level, the page is 1 bit (0 or 1) and the offset is 1 bit (0 or 1). Combining these together, the page size would be 2 x 2 (or 2^2), because of the number of bits they both have (2 each, so 2 x 2).
Now if the page/offset were larger, then it would be n x n -- n being the number of bits they both have.
The page no. is always in the power of 2 as (2^n).
There are several reasons:
1.The memory address is also in 2^n so it will be more easy to move a page.
2.There are two bits which represent the page:
a.Page no. b.Offset no
So,This is also the reason.
3.Memory is in quantised form like charge(q).

Generating Uniform Random Deviates within a given range

I'd like to generate uniformly distributed random integers over a given range. The interpreted language I'm using has a builtin fast random number generator that returns a floating point number in the range 0 (inclusive) to 1 (inclusive). Unfortunately this means that I can't use the standard solution seen in another SO question (when the RNG returns numbers between 0 (inclusive) to 1 (exclusive) ) for generating uniformly distributed random integers in a given range:
result=Int((highest - lowest + 1) * RNG() + lowest)
The only sane method I can see at the moment is in the rare case that the random number generator returns 1 to just ask for a new number.
But if anyone knows a better method I'd be glad to hear it.
Rob
NB: Converting an existing random number generator to this language would result in something infeasibly slow so I'm afraid that's not a viable solution.
Edit: To link to the actual SO answer.
Presumably you are desperately interested in speed, or else you would just suck up the conditional test with every RNG call. Any other alternative is probably going to be slower than the branch anyway...
...unless you know exactly what the internal structure of the RNG is. Particularly, what are its return values? If they're not IEEE-754 floats or doubles, you have my sympathies. If they are, how many real bits of randomness are in them? You would expect 24 for floats and 53 for doubles (the number of mantissa bits). If those are naively generated, you may be able to use shifts and masks to hack together a plain old random integer generator out of them, and then use that in your function (depending on the size of your range, you may be able to use more shifts and masks to avoid any branching if you have such a generator). If you have a high-quality generator that produces full quality 24- or 53-bit random numbers, then with a single multiply you can convert them from [0,1] to [0,1): just multiply by the largest generatable floating-point number that is less than 1, and your range problem is gone. This trick will still work if the mantissas aren't fully populated with random bits, but you'll need to do a bit more work to find the right multiplier.
You may want to look at the C source to the Mersenne Twister to see their treatment of similar problems.
I don't see why the + 1 is needed. If the random number generator delivers a uniform distribution of values in the [0,1] interval then...
result = lowest + (rng() * (highest - lowest))
should give you a unform distribution of values between lowest
rng() == 0, result = lowest + 0 = lowest
and highest
rng() == 1, result = lowest + highest - lowest = highest
Including + 1 means that the upper bound on the generated number can be above highest
rng() == 1, result = lowest + highest - lowest + 1 = highest + 1.
The resulting distribution of values will be identical to the distribution of the random numbers, so uniformity depends on the quality of your random number generator.
Following on from your comment below you are right to point out that Int() will be the source of a lop-sided distribution at the tails. Better to use Round() to the nearest integer or whatever equivalent you have in your scripting language.