Related
I'm reading a book of David Patterson and John Hennesy titled: Computer Organization and Design. In the RISC-V architecture set which the book is about there are two instruction formats related to jumping - SB-type and UJ-type. The former uses 12-bit constant to represent offset (in bytes) to jump from the current instruction and the latter uses 20-bit constant to represent the same. Then the author says the following:
Since the program counter (PC) contains the address of the current instruction, we can branch (SB-type) within +=2^10 words of the current instruction, or jump (UJ) within += 2^18 words of the current instruction, if we use the PC as the register to be added to the address.
I don't understand how they get those 2^10 and 2^18. Since the constant the instructions use is two's complement, then it can represent values from -2^11 to 2^11 - 1 in the first case and -2^19 to 2^19 - 1 in the second case. Since these constants represent bytes, but we want to know how many words we can jump over, therefore we need to divide max value of bytes by four, so the max which we can get is 2^11 / 2^2 = 2^9 words in the first case and 2^17 in the second one.
Could someone please take a look at my calculations above and point me out to what I'm missing and what's wrong with my calculations and thoughts?
UPDATE:
Probably I didn't understand the author correctly. May it be the case that they mean the lower-bound (-2^10) and upper-bound (+2^10)? So they mean that we can never jump beyond 2^10 from the current instruction?
We were asked to find a way to compress a square binary matrix as much as possible, and if possible, to add redundancy bits to check and maybe correct errors.
The redundancy thing is easy to implement in my opinion. The complicated part is compressing the matrix. I thought about using run-length after reshaping the matrix to a vector because there will be more zeros than ones, but I only achieved a 40bits compression (we are working on small sizes) although I thought it'd be better.
Also, after run-length an idea was Huffman coding the matrix, but a dictionary must be sent in order to recover the original information.
I'd like to know what would be the best way to compress a binary matrix?
After reading some comments, yes #Adam you're right, the 14x14 matrix should be compressed in 128bits, so if I only use the coordinates (rows&cols) for each non-zero element, still it would be 160bits (since there are twenty ones). I'm not looking for an exact solution but for a useful idea.
You can only talk about compressing something if you have a distribution and a representation. That's the issue of the dictionary you have to send along: you always need some sort of dictionary of protocol to uncompress something. It just so happens that things like .zip and .mpeg already have those dictionaries/codecs. Even something as simple as Huffman-encoding is an algorithm; on the other side of the communication channel (you can think of compression as communication), the other person already has a bit of code (the dictionary) to perform the Huffman decompression scheme.
Thus you cannot even begin to talk about compressing something without first thinking "what kinds of matrices do I expect to see?", "is the data truly random, or is there order?", and if so "how can I represent the matrices to take advantage of order in the data?".
You cannot compress some matrices without increasing the size of other objects (by at least 1 bit). This is bad news if all matrices are equally probable, and you care equally about them all.
Addenda:
The answer to use sparse matrix machinery is not necessarily the right answer. The matrix could for example be represented in python as [[(r+c)%2 for c in range (cols)] for r in range(rows)] (a checkerboard pattern), and a sparse matrix wouldn't compress it at all, but the Kolmogorov complexity of the matrix is the above program's length.
Well, I know every matrix will have the same number of ones, so this is kind of deterministic. The only think I don't know is where the 1's will be. Also, if I transmit the matrix with a dictionary and there are burst errors, maybe the dictionary gets affected so... wouldnt be the resulting information corrupted? That's why I was trying to use lossless data compression such as run-length, the decoder just doesnt need a dictionary. --original poster
How many 1s does the matrix have as a fraction of its size, and what is its size (NxN -- what is N)?
Furthermore, this is an incorrect assertion and should not be used as a reason to desire run-length encoding (which still requires a program); when you transmit data over a channel, you can always add error-correction to this data. "Data" is just a blob of bits. You can transmit both the data and any required dictionaries over the channel. The error-correcting machinery does not care at all what the bits you transmit are for.
Addendum 2:
There are (14*14) choose 20 possible arrangements, which I assume are randomly chosen. If this number was larger than 128^2 what you're trying to do would be impossible. Fortunately log_2((14*14) choose 20) ~= 90bits < 128bits so it's possible.
The simple solution of writing down 20 numbers like 32,2,67,175,52,...,168 won't work because log_2(14*14)*20 ~= 153bits > 128bits. This would be equivalent to run-length encoding. We want to do something like this but we are on a very strict budget and cannot afford to be "wasteful" with bits.
Because you care about each possibility equally, your "dictionary"/"program" will simulate a giant lookup table. Matlab's sparse matrix implementation may work but is not guaranteed to work and is thus not a correct solution.
If you can create a bijection between the number range [0,2^128) and subsets of size 20, you're good to go. This corresponds to enumerating ways to descend the pyramid in http://en.wikipedia.org/wiki/Binomial_coefficient to the 20th element of row 196. This is the same as enumerating all "k-combinations". See http://en.wikipedia.org/wiki/Combination#Enumerating_k-combinations
Fortunately I know that Mathematica and Sage and other CAS software can apparently generate the "5th" or "12th" or arbitrarily numbered k-subset. Looking through their documentation, we come upon a function called "rank", e.g. http://www.sagemath.org/doc/reference/sage/combinat/subset.html
So then we do some more searching, and come across some arcane Fortran code like http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_rank.m and http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_unrank.m
We could reverse-engineer it, but it's kind of dense. But now we have enough information to search for k-subset rank unrank, which leads us to http://www.site.uottawa.ca/~lucia/courses/5165-09/GenCombObj.pdf -- see the section
"Generating k-subsets (of an n-set): Lexicographical
Ordering" and the rank and unrank algorithms on the next few pages.
In order to achieve the exact theoretically optimal compression, in the case of a uniformly random distribution of 1s, we must thus use this technique to biject our matrices to our output number of range <2^128. It just so happens that combinations have a natural ordering, known as ranking and unranking of combinations. You assign a number to each combination (ranking), and if you know the number you automatically know the combination (unranking). Googling k-subset rank unrank will probably yield other algorithms.
Thus your solution would look like this:
serialize the matrix into a list
e.g. [[0,0,1][0,1,1][1,0,0]] -> [0,0,1,0,1,1,1,0,0]
take the indices of the 1s:
e.g. [0,0,1,0,1,1,1,0,0] -> [3,5,6,7]
1 2 3 4 5 6 7 8 9 a k=4-subset of an n=9 set
take the rank
e.g. compressed = rank([3,5,6,7], n=9)
compressed==412 (or something, I made that up)
you're done!
e.g. 412 -binary-> 110011100 (at most n=9bits, less than 2^n=2^9=512)
to uncompress, unrank it
I'll get to 128 bits in a sec, first here's how you fit a 14x14 boolean matrix with exactly 20 nonzeros into 136 bits. It's based on the CSC sparse matrix format.
You have an array c with 14 4-bit counters that tell you how many nonzeros are in each column.
You have another array r with 20 4-bit row indices.
56 bits (c) + 80 bits (r) = 136 bits.
Let's squeeze 8 bits out of c:
Instead of 4-bit counters, use 2-bit. c is now 2*14 = 28 bits, but can't support more than 3 nonzeros per column. This leaves us with 128-80-28 = 20 bits. Use that space for array a4c with 5 4-bit elements that "add 4 to an element of c" specified by the 4-bit element. So, if a4c={2,2,10,15, 15} that means c[2] += 4; c[2] += 4 (again); c[10] += 4;.
The "most wasteful" distribution of nonzeros is one where the column count will require an add-4 to support 1 extra nonzero: so 5 columns with 4 nonzeros each. Luckily we have exactly 5 add-4s available.
Total space = 28 bits (c) + 20 bits
(a4c) + 80 bits (r) = 128 bits.
Your input is a perfect candidate for a sparse matrix. You said you're using Matlab, so you already have a good sparse matrix built for you.
spm = sparse(dense_matrix)
Matlab's sparse matrix implementation uses Compressed Sparse Columns, which has memory usage on the order of 2*(# of nonzeros) + (# of columns), which should be pretty good in your case of 20 nonzeros and 14 columns. Storing 20 values sure is better than storing 196...
Also remember that all matrices in Matlab are going to be composed of doubles. Just because your matrix can be stored as a 1-bit boolean doesn't mean Matlab won't stick it into a 64-bit floating point value... If you do need it as a boolean you're going to have to make your own type in C and use .mex files to interface with Matlab.
After thinking about this again, if all your matrices are going to be this small and they're all binary, then just store them as a binary vector (bitmask). Going off your 14x14 example, that requires 196 bits or 25 bytes (plus n, m if your dimensions are not constant). That same vector in Matlab would use 64 bits per element, or 1568 bytes. So storing the matrix as a bitmask takes as much space as 4 elements of the original matrix in Matlab, for a compression ratio of 62x.
Unfortunately I don't know if Matlab supports bitmasks natively or if you have to resort to .mex files. If you do get into C++ you can use STL's vector<bool> which implements a bitmask for you.
Is there is some mathematical "optimum" base that would speed up factorial calculation?
Background:
Just for fun, I'm implementing my own bignum library. (-: Is this my first mistake? :-).
I'm experimenting with various bases used in the internal representation and regression testing by printing out exact values (in decimal) for n factorial (n!).
The way my bignum library represents integers and does multiplication, the time is proportional to the total number of "1" bits in the internal representation n!.
Using a base 2, 4, 8, 16, 2^8, 2^30, or etc. in my internal representation all give me exactly the same total number of "1" bits for any particular number.
Unless I've made some mistake, any given factorial (n!) represented in base 18 has fewer "1" bits than the same value represented in base 10 or or base 16 or base 19.
And so (in principle), using base 18 would make my bignum library run faster than using base 10 or some binary 2^w base or base 19.
I think this has something to do with the fact that n! is either shorter or has more "trailing zeros" or both when printed out in base 18 than in base 10 or or base 16 or base 19.
Is there some other base that would work even better than base 18?
In other words,
Is there a base that represents n! with even fewer "1" bits than base 18?
This is not a dup of "What is a convenient base for a bignum library & primality testing algorithm?" because I suspect "the optimum base for working with integers that are known to be large factorials, with lots of factors of 2 and 3" is different than "the optimum base for working with integers that don't have any small factors and are possibly prime".
(-: Is speeding up factorial calculations -- perhaps at the expense of other kinds of calculations -- my second mistake? :-)
edit:
For example:
(decimal) 16! ==
(decimal ) == 20,922,789,888,000 // uses decimal 14 "1" bits
(dozenal ) == 2,41A,B88,000,000 // uses decimal 10 "1" bits
(hexadecimal) == 130,777,758,000 // uses decimal 18 "1" bits
(octadecimal) == 5F,8B5,024,000 // uses decimal 14 "1" bits
(I'm more-or-less storing the digits on the right, without the commas, plus some metadata overhead).
(While one might think that "As you increase the base you will use fewer "1" bits to represent a given number", or ""As you increase the base you will use fewer nonzero digits to represent a given number", the above example shows that is not always true.)
I'm storing each digit as a small integer ("int" or "long int" or "byte"). Is there any other reasonable way to store digits?
I'm pretty sure my computer stores those integers in binary -- each "1", "2", "4", "8", and "G" digit use one "1" bit; each "3", "5", "6", "9", and "A" digit use two "1" bits; each "7" and "B" digit use three "1" bits; each "F" digit uses four "1" bits, etc.
Both the decimal and the octadecimal representation of this value (16!) require 14 "1" bits.
So I made a mistake in my earlier calculation: For every n, representing n! in octadecimal doesn't always have fewer "1" bits than representing the same value in decimal.
But the question still stands: is there some other "optimum" base that requires the fewest number of 1 bits for storing large factorials?
Someone asks: "How do you store those numbers?"
Well, that's exactly my question -- what is the best way of storing numbers of the form n! ?
I could internally use digits in base 10, or some power-of-two base, or base 18, or some other base. Which one is best?
I could store these integers internally as a 1D array of digits, with a length however long is needed to store all the digits. Is there any reasonable way of printing out 100! in decimal without such an array?
If you're just trying to optimize running time for calculating factorial, and changing the base is the only parameter you're changing, then the optimum base will likely contain small factors. 60 might be a reasonable choice. If you want to experiment, I would try various bases of the form (2^a)(3^b)(5^c)
Improving the speed of multiplication is probably the best way performance. What algorithm are you using for multiplication? (school-book, Karatsuba, Toom-Cook, FFT, ...)
There are other factors to consider, too. If you will be converting the numbers to decimal frequently, then a base that is a power of 10 will make the conversion as fast as possible.
Many(*) years ago, I wrote a base-6 floating point library specifically to solve a problem with repeated multiplication/division by 2 and/or 3. But unless you are trying to solve a specific problem, I think you will be better served by optimizing your algorithms in general than by just trying to optimize factorial.
casevh
(*) I originally said "Several years ago" until I remembered the program ran for many days on a 12Mhz 80286.
While from purely mathematical viewpoint the optimal base is e (and after rounding to nearest integer - 3), from a practical standpoint for a bignum library on a computer, pick a machine word size as the base for your numeric system ( 2^32 or 2^64 ). Yes, it's huge, but the higher abstraction layer of your bignum system is the choke point, the underlying calculations on machine words are the fast part, so delegate as much computation to the CPU low-level instructions while keeping your own work to minimum.
And no, it's not a mistake. It's a very good learning exercise.
I will not pretend I know any math, so do not take my answer as the holy "optimum" you are probably looking for. If I would have to do factorial as fast as possible, I would either try some approximation (something like Stirling approximation) or reduce the number of multiplications, because multiplication is expensive operation. If you represent the number in k-base, you can simulate multiplication by k with help of shifting. If you choose the 2-base, half of all multiplications will be shifts. The other multiplications are shifts and one bit switch. If you aim for minimizing the number of "1"s in your representation of the number, this depends on what numbers you represent. As you increase the base you will use fewer "1"s to represent a given number, but you will need to have more bits for every order, which means more potential "1"s. I hope it helps at least a bit, if not, just ask, I will try to answer.
If by '"1" bits' you mean digits, then I suggest a base of 256 or 65536. In other words, make each byte / word a "digit" for the purposes of your math. The computer handles these numbers routinely and is optimized for doing so. Your factorials will be swift, and so will other operations.
Not to mention the computer handles a great deal of the conversions from a similar base to these with ease. (rhyming unintentional)
What is the most optimal way to find repetition in a infinite sequence of integers?
i.e. if in the infinite sequence the number '5' appears twice then we will return 'false' the first time and 'true' the second time.
In the end what we need is a function that returns 'true' if the integer appeared before and 'false' if the function received the integer the first time.
If there are two solutions, one is space-wise and the second is time-wise, then mention both.
I will write my solution in the answers, but I don't think it is the optimal one.
edit: Please don't assume the trivial cases (i.e. no repetitions, a constantly rising sequence). What interests me is how to reduce the space complexity of the non-trivial case (random numbers with repetitions).
I'd use the following approach:
Use a hash table as your datastructure. For every number read, store it in your datastructure. If it's already stored before you found a repetition.
If n is the number of elements in the sequence from start to the repetition, then this only requires O(n) time and space. Time complexity is optimal, as you need to at least read the input sequence's elements up to the repetition point.
How long of a sequence are we talking (before the repetition occurs)? Is a repetition even guaranteed at all? For extreme cases the space complexity might become problematic. But to improve it you will probably need to know more structural information on your sequence.
Update: If the sequence is as you say very long with seldom repetitions and you have to cut down on the space requirement, then you might (given sufficient structural information on the sequence) be able to cut down the space cost.
As an example: let's say you know that your infinite sequence has a general tendency to return numbers that fit within the current range of witnessed min-max numbers. Then you will eventually have whole intervals that have already been contained in the sequence. In that case you can save space by storing such intervals instead of all the elements contained within it.
A BitSet for int values (2^32 numbers) would consume 512Mb. This may be ok if the BitSets are allocated not to often, fast enough and the mem is available.
An alternative are compressed BitSets that work best for sparse BitSets.
Actually, if the max number of values is infinite, you can use any lossless compression algorithm for a monochrome bitmap. IF you imagine a square with at least as many pixels as the number of possible values, you can map each value to a pixel (with a few to spare). Then you can represent white as the pixels that appeared and black for the others and use any compression algorithm if space is at a premium (that is certainly a problem that has been studied)
You can also store blocks. The worst case is the same in space O(n) but for that worst case you need that the number appeared have exactly 1 in between them. Once more numbers appear, then the storage will decrease:
I will write pseudocode and I will use a List, but you can always use a different structure
List changes // global
boolean addNumber(int number):
boolean appeared = false
it = changes.begin()
while it.hasNext():
if it.get() < number:
appeared != appeared
it = it.next()
else if it.get() == number:
if !appeared: return true
if it.next().get() == number + 1
it.next().remove() // Join 2 blocks
else
it.insertAfter(number + 1) // Insert split and create 2 blocks
it.remove()
return false
else: // it.get() > number
if appeared: return true
it.insertBefore(number)
if it.get() == number + 1:
it.remove() // Extend next block
else:
it.insertBefore(number + 1)
}
return false
}
What this code is the following: it stores a list of blocks. For each number that you add, it iterates over the list storing blocks of numbers that appeared and numbers that didn't. Let me illustrate with an example; I will add [) to illustrate which numbers in the block, the first number is included, the last is not.In the pseudocode it is replaced by the boolean appeared. For instance, if you get the 5, 9, 6, 8, 7 (in this order) you will have the following sequences after each function:
[5,6)
[5,6),[9,10)
[5,7),[9,10)
[5,7),[8,10)
[5,10)
In the last value you keep a block of 5 numbers with only 2.
Return TRUE
If the sequence is infinite then there will be repetition of every conceivable pattern.
If what you want to know is the first place in the sequence when there is a repeated digit that's another matter, but there's some difference between your question and your example.
Well, it seems obvious that in any solution we'll need to save the numbers that already appeared, so space wise we will always have a worst-case of O(N) where N<=possible numbers with the word size of our number type (i.e. 2^32 for C# int) - this is problematic over a long time if the sequence is really infinite/rarely repeats itself.
For saving the numbers that already appeared I would use an hash table and then check it each time I receive a new number.
What is the best way to constrain the values of a PRNG to a smaller range? If you use modulus and the old max number is not evenly divisible by the new max number you bias toward the 0 through (old_max - new_max - 1). I assume the best way would be something like this (this is floating point, not integer math)
random_num = PRNG() / max_orginal_range * max_smaller_range
But something in my gut makes me question that method (maybe floating point implementation and representation differences?).
The random number generator will produce consistent results across hardware and software platforms, and the constraint needs to as well.
I was right to doubt the pseudocode above (but not for the reasons I was thinking). MichaelGG's answer got me thinking about the problem in a different way. I can model it using smaller numbers and test every outcome. So, let's assume we have a PRNG that produces a random number between 0 and 31 and you want the smaller range to be 0 to 9. If you use modulus you bias toward 0, 1, 2, and 3. If you use the pseudocode above you bias toward 0, 2, 5, and 7. I don't think there can be a good way to map one set into the other. The best that I have come up with so far is to regenerate the random numbers that are greater than old_max/new_max, but that has deep problems as well (reducing the period, time to generate new numbers until one is in the right range, etc.).
I think I may have naively approached this problem. It may be time to start some serious research into the literature (someone has to have tackled this before).
I know this might not be a particularly helpful answer, but I think the best way would be to conceive of a few different methods, then trying them out a few million times, and check the result sets.
When in doubt, try it yourself.
EDIT
It should be noted that many languages (like C#) have built in limiting in their functions
int maximumvalue = 20;
Random rand = new Random();
rand.Next(maximumvalue);
And whenever possible, you should use those rather than any code you would write yourself. Don't Reinvent The Wheel.
This problem is akin to rolling a k-sided die given only a p-sided die, without wasting randomness.
In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p". Thus, for example, if p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k has prime factors other than 2, the best you can do is get arbitrarily close to no waste of randomness, such as by batching multiple rolls of the p-sided die until p^n is "close enough" to a power of k.
Let me also go over some of your concerns about regenerating random numbers:
"Reducing the period": Besides batching of bits, this concern can be dealt with in several ways:
Use a PRNG with a bigger "period" (maximum cycle length).
Add a Bays–Durham shuffle to the PRNG's implementation.
Use a "true" random number generator; this is not trivial.
Employ randomness extraction, which is discussed in Devroye and Gravel 2015-2020 and in my Note on Randomness Extraction. However, randomness extraction is pretty involved.
Ignore the problem, especially if it isn't a security application or serious simulation.
"Time to generate new numbers until one is in the right range": If you want unbiased random numbers, then any algorithm that does so will generally have to run forever in the worst case. Again, by Lemma 3, the algorithm will run forever in the worst case unless "every prime number dividing k also divides p", which is not the case if, say, k is 10 and p is 32.
See also the question: How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?, especially my answer there.
If PRNG() is generating uniformly distributed random numbers then the above looks good. In fact (if you want to scale the mean etc.) the above should be fine for all purposes. I guess you need to ask what the error associated with the original PRNG() is, and whether further manipulating will add to that substantially.
If in doubt, generate an appropriately sized sample set, and look at the results in Excel or similar (to check your mean / std.dev etc. for what you'd expect)
If you have access to a PRNG function (say, random()) that'll generate numbers in the range 0 <= x < 1, can you not just do:
random_num = (int) (random() * max_range);
to give you numbers in the range 0 to max_range?
Here's how the CLR's Random class works when limited (as per Reflector):
long num = maxValue - minValue;
if (num <= 0x7fffffffL) {
return (((int) (this.Sample() * num)) + minValue);
}
return (((int) ((long) (this.GetSampleForLargeRange() * num))) + minValue);
Even if you're given a positive int, it's not hard to get it to a double. Just multiply the random int by (1/maxint). Going from a 32-bit int to a double should provide adequate precision. (I haven't actually tested a PRNG like this, so I might be missing something with floats.)
Psuedo random number generators are essentially producing a random series of 1s and 0s, which when appended to each other, are an infinitely large number in base two. each time you consume a bit from you're prng, you are dividing that number by two and keeping the modulus. You can do this forever without wasting a single bit.
If you need a number in the range [0, N), then you need the same, but instead of base two, you need base N. It's basically trivial to convert the bases. Consume the number of bits you need, return the remainder of those bits back to your prng to be used next time a number is needed.