Encoder number of outpus for opcode within a MIPS machine instruction - mips

If I have an encoder with 8 data inputs, what is its maximum number of outputs?
I know that an encoder is a combinational circuit that performs the reverse operation of a decoder. It has a maximum of 2^n input lines and ‘n’ output lines, hence it encodes the information from 2^n inputs into an n-bit code. Since I have 8 data input, the output will be 3, since 2^3 = 8. Is that the correct assumption?

Let's try to tease apart the concepts of one hot (decoded) lines and an encoding using a number of bits.  Both these concepts are a way to represent information, but their form and typical usage is different.
One hot is a technique wherein at most one line is 1/true and all the other lines are 0/false.  These one hot lines are not considered digits in a number, but rather individual signals or conditions (only one of which is can be true at any given time).  This form is particularly useful in certain circuits, as each of the one hot lines can activate some other hardware.  (A hardware lookup table (LUT), a RAM or ROM may use one-hot within its internal array indexing.)
Encoding is a technique where we use N lines as digits in an N-bit number, as would be found in a CPU register holding a number, or as we might write normal binary numbers in text.  By contrast, in this form any of the N bits can be 1 (or 0).
Simple encoders & decoders translate between encoded form (N-bit numbers) and one hot form (2N lines).
... encoder ... has a maximum of 2^n input lines and ‘n’ output lines
In your statement, the 2^n input lines are in one hot form, while the output lines are normal numbers in binary (i.e. encoded).
Both the inputs (2^n lines) and the outputs (n lines) are capable of representing exactly 2^n different values!  As a result, decode/encode is a 1:1 mapping, back & forth.  (It would be an error to have multiple hots on the input side of such a decoder, and bad things would happen in a system that allowed that.)
In the formulas you're speaking to:  2N = V,  and   N = log2 ( V )  —  N stands for number of bits (a bit is a binary digit), and V stands for number of values that can be represented in N bits.
(While the 2's in these formulas are for binary — substitute 2 with 10 for the same relationships for number of decimal digits vs. number of values those number of digits can represent/store/communicate).
In one hot form we need V number of lines, whereas in encoded form we need N lines (as bits/digits) to represent the same information (one of V different values).
Consider whether a number you're looking is a digit count (as with N) or a value count (as with V).
And bear in mind that in one hot form, we need one line for each possible value, V (whereas in encoded form we need N bits for V possible values).
A MIPS processor will feed the 6 bit opcode field into a lookup table of some sort, in order to determine which set of control signals to activate for any given instruction.  (The opcode field is not one hot, but rather a bit field of N=6 bits).
These control signals are (also) not one hot, and the MIPS instruction decoder is not using a simple decoder, but rather a mapper that goes between encoded opcode values and effectively encoded control signals — this mapping is accomplished by lookup in a table.
These control signals are individual boolean values rather than as a set either one-hot or an encoded number.  One hot may be used internally in indexing of this mapping.  This mapping is basically an array lookup where the index is the opcode and each array element has all the individual control signal values appropriate its index.
(R-Type instructions all share a common opcode value, so when the R-Type opcode value is present, then additional lookup/mapping is done on the func bit field to generate the proper control signals.)

Related

LC-3 algorithm for converting ASCII strings to Binary Values

Figure 10.4 provides an algorithm for converting ASCII strings to binary values. Suppose the decimal number is arbitrarily long. Rather than store a table of 10 values for the thousands-place digit, another table for the 10 ten-thousands-place digit, and so on, design an algorithm to do the conversion without resorting to any tables whatsoever.
I have attached pictures of figure 10.4. I am not looking for an answer to the problem, but rather can someone please explain this problem and perhaps give some direction on how to go about creating the algorithm?
Figure 10.4
Figure 10.4 second image
I am unsure as to what it means by tables and do not know where to start really.
The tables are those global, initialized arrays: one called Lookup10 holding 10, 20, 30, 40, ..., and another called Lookup100 holding 100, 200, 300, 400...
You can ignore the tables: as per the assignment instructions you're supposed to find a different way to accomplish this anyway.  Or, you can run that code in simulator or mentally to understand how it works.
The bottom line is that LC-3, while it can do anything (it is turning complete), it can't do much in any one instruction.  For arithmetic & logic, it can do add, not, and.  That's pretty much it!  But that's enough — let's note that modern hardware does everything with only one logic gate, namely NAND, which is a binary operator (so NAND directly available; NOT by providing NAND with the same operand for both inputs; AND by doing NOT after NAND; OR using NOT on both inputs first and then NAND; etc..)
For example, LC-3 cannot multiply or divide or modulus or right shift directly — each of those operations is many instructions and in the general case, some looping construct.  Multiplication can be done by repetitive addition, and division/modulus by repetitive subtraction.  These are super inefficient for larger operands, and there are much more efficient algorithms that are also substantially more complex, so those greatly increase program complexity beyond that already with the repetitive operation approach.
That subroutine goes backwards through the use input string.  It takes a string length count in R1 as parameter supplied by caller (not shown).  It looks at the last character in the input and converts it from an ASCII character to a binary number.
(We would commonly do that conversion from ascii character to numeric value using subtraction: moving the character values from the ascii character range of 0x30..0x39 to numeric values in the range 0..9, but they do it with masking, which also works.  The subtraction approach integrates better with error detection (checking if not a valid digit character, which is not done here), whereas the masking approach is simpler for LC-3.)
The subroutine then obtains the 2nd last digit (moving backwards through the user's input string), converting that to binary using the mask approach.  That yields a number between 0 and 9, which is used as an index into the first table Lookup10.  The value obtained from the table at that index position is basically the index × 10.  So this table is a × 10 table.  The same approach is used for the third (and first or, last-going-backwards) digit, except it uses the 2nd table which is a × 100 table.
The standard approach for string to binary is called atoi (search it) standing for ascii to integer.  It moves forwards through the string, and for every new digit, it multiples the existing value, computed so far, by 10 before adding in the next digit's numeric value.
So, if the string is 456, the first it obtains 4, then because there is another digit, 4 × 10 = 40, then + 5 for 45, then × 10 for 450, then + 6 for 456, and so on.
The advantage of this approach is that it can handle any number of digits (up to overflow).  The disadvantage, of course, is that it requires multiplication, which is a complication for LC-3.
Multiplication where one operand is the constant 10 is fairly easy even in LC-3's limited capabilities, and can be done with simple addition without looping.  Basically:
n × 10 = n + n + n + n + n + n + n + n + n + n
and LC-3 can do those 9 additions in just 9 instructions.  Still, we can also observe that:
n × 10 = n × 8 + n × 2
and also that:
n × 10 = (n × 4 + n) × 2     (which is n × 5 × 2)
which can be done in just 4 instructions on LC-3 (and none of these needs looping)!
So, if you want to do this approach, you'll have to figure out how to go forwards through the string instead of backwards as the given table version does, and, how to multiply by 10 (use any one of the above suggestions).
There are other approaches as well if you study atoi.  You could keep the backwards approach, but now will have to multiply by 10, by 100, by 1000, a different factor for each successive digit .  That might be done by repetitive addition.  Or a count of how many times to multiply by 10 — e.g. n × 1000 = n × 10 × 10 × 10.

How does a computer turn a string of ASCII into a signed or unsigned number?

For example if I type:
-6
Through what mechanism is that turned into:
1010
Would it be hardware based or somewhere in the kernel?
Would it be hardware based or somewhere in the kernel?
Usually no and no.
The kernel in a mainstream OS like Linux will usually just pass along bytes of text to user-space.
So a user-space program gets a string, i.e. a sequence of characters. (In simple cases, e.g. the ASCII subset of UTF-8, each character is a single byte.) A program would typically use a function like atoi() to convert a sequence of characters (representing ASCII codes for digits) to a binary integer. It's a standard library function because many programs need to deal with strings that represent integers, but it's a software function just like any other.
A simple implementation would have a loop like
int sum = 0;
for (auto d: digits) { // look at digits in MSB-first order
sum = 10*sum + d;
}
// the first digit ends up being multiplied by 10 n times
// the 2nd by 10 n-1 times, and so on. Each digit is multiplied by its place value.
This C++ source would be compiled to multiple asm instructions that implement it. Handling an optional - by negating is also a separate instruction. There's typically a neg instruction of some sort, or a way to subtract from zero, to get the 2's complement inverse. (Assuming 2's complement hardware).
You can speed this up by using fancier instructions that do more work per instruction / per clock cycle. On x86 for example you can convert a multi-digit string of digits to a binary integer with a few SIMD instructions, but that's still just using multiply and add instructions. See How to implement atoi using SIMD? for a nice use of pmaddwd to multiply by a vector of place-values and horizontally add. Also Fastest way to get IPv4 address from string is a cool examples of what you can do with packed-compare and looking up a pshufb shuffle-control vector from a table based on that compare result.
A function like scanf("%d", &num) that reads input as a number is implemented in user-space, but under the hood it uses a system call like read() to get data. (If the C stdio input buffer was empty.)
Some "toy" / teaching systems like the MARS and SPIM MIPS simulators have system calls that get get or print integers (with the input or result in an integer register). In that case, yes, the kernel does it in software.
Or depending on the implementation, there isn't actually a kernel at all, and the syscall instruction escapes to the emulator / simulator's input/output function, so from the POV of software running inside this virtual simulated machine, there really is hardware support for integer conversion. But no real hardware does the entire thing in microcode or actual hardware, at least not any mainstream architectures.

Reading a CSV file of varying precision in Fortran

I am using an external program to run a simulation which returns to me a csv file containing output data. I need to read the data from this file into my fortran program, which analyses and optimizes the input conditions to rerun the external program.
The CSV file has say 20 columns and 70 rows. Each column contains output data for a specific parameter. Now since that program is not written by me, I cannot control the precision of the output values. So in many cases the external program truncates the number of digits after the decimal it they are zero. So it is possible in run number 1, a certain field has 3 digits after the decimal, but has only 2 digits after the decimal in run number 2.
What am I supposed to do for this? I cannot use the read command since in that I need to specify in advance the number of digits my program has to read.
I basically need a way for my program to identify data between commas and read a value or varying precision between the commas.
For input, the decimal part of a format specifier is only used if the input field does not contain a decimal point.
For the last few decades (since the demise of punched cards), users typically expect that a numeric value that doesn't contain a decimal point is an integer value. Consequently, for input, format specifications for real numbers should always have .0 for their decimal part.
For example, after:
CHARACTER(4) :: input
REAL :: a, b
input = '1 '
READ (input, "(F4.0)") a
READ (input, "(F4.1)") b
a will have the value 1.0, and b will have the value 0.1.
(For input, it doesn't particularly matter which particular real data descriptor is used (F, E, D, or G) - they all behave the same regardless of the nature of the input.)
So, for input, all you have to worry about is getting the field width right. Once you have read a record into a string this is easy enough to do by using the INDEX intrinsic.

Parsing base 2^32 numbers to decimal (For theorically unlimited numbers)

I am working on a C++ problem where I have to print my class.
My class stores and does arithmetic and logic operations on theorically unlimited long numbers. It has an array of unsigned ints to hold the number. For example:
If the number is {a*(2^32) + b} , the class stores it as {array[0]=b , array[1]=a}.
So it is like a number of base (2^32). The problem is how do i convert this number to decimal so i can print it? Simply {a*(2^32) + b} will not do because it doesnt fit into unsigned int. I do not have to store the decimal number but just print it.
What i have got so far
I have thought of firstly converting the number to binary (which is an easy task) then printing it. But same problem arises because there is still no big enough variable to hold the multiplication.
Wild thought
I wonder if I can use my own class to hold the multiplication and with some iterative method do the printing?
I also wonder if this can be solved with some use of logarithmics?
Note: I am not allowed to use other libraries or other long types like double and longer.
Although I say this is for theorically unlimited numbers it would help if I could just find the way to print array of size 2. Then I can think about longer numbers.

Compressing a binary matrix

We were asked to find a way to compress a square binary matrix as much as possible, and if possible, to add redundancy bits to check and maybe correct errors.
The redundancy thing is easy to implement in my opinion. The complicated part is compressing the matrix. I thought about using run-length after reshaping the matrix to a vector because there will be more zeros than ones, but I only achieved a 40bits compression (we are working on small sizes) although I thought it'd be better.
Also, after run-length an idea was Huffman coding the matrix, but a dictionary must be sent in order to recover the original information.
I'd like to know what would be the best way to compress a binary matrix?
After reading some comments, yes #Adam you're right, the 14x14 matrix should be compressed in 128bits, so if I only use the coordinates (rows&cols) for each non-zero element, still it would be 160bits (since there are twenty ones). I'm not looking for an exact solution but for a useful idea.
You can only talk about compressing something if you have a distribution and a representation. That's the issue of the dictionary you have to send along: you always need some sort of dictionary of protocol to uncompress something. It just so happens that things like .zip and .mpeg already have those dictionaries/codecs. Even something as simple as Huffman-encoding is an algorithm; on the other side of the communication channel (you can think of compression as communication), the other person already has a bit of code (the dictionary) to perform the Huffman decompression scheme.
Thus you cannot even begin to talk about compressing something without first thinking "what kinds of matrices do I expect to see?", "is the data truly random, or is there order?", and if so "how can I represent the matrices to take advantage of order in the data?".
You cannot compress some matrices without increasing the size of other objects (by at least 1 bit). This is bad news if all matrices are equally probable, and you care equally about them all.
Addenda:
The answer to use sparse matrix machinery is not necessarily the right answer. The matrix could for example be represented in python as [[(r+c)%2 for c in range (cols)] for r in range(rows)] (a checkerboard pattern), and a sparse matrix wouldn't compress it at all, but the Kolmogorov complexity of the matrix is the above program's length.
Well, I know every matrix will have the same number of ones, so this is kind of deterministic. The only think I don't know is where the 1's will be. Also, if I transmit the matrix with a dictionary and there are burst errors, maybe the dictionary gets affected so... wouldnt be the resulting information corrupted? That's why I was trying to use lossless data compression such as run-length, the decoder just doesnt need a dictionary. --original poster
How many 1s does the matrix have as a fraction of its size, and what is its size (NxN -- what is N)?
Furthermore, this is an incorrect assertion and should not be used as a reason to desire run-length encoding (which still requires a program); when you transmit data over a channel, you can always add error-correction to this data. "Data" is just a blob of bits. You can transmit both the data and any required dictionaries over the channel. The error-correcting machinery does not care at all what the bits you transmit are for.
Addendum 2:
There are (14*14) choose 20 possible arrangements, which I assume are randomly chosen. If this number was larger than 128^2 what you're trying to do would be impossible. Fortunately log_2((14*14) choose 20) ~= 90bits < 128bits so it's possible.
The simple solution of writing down 20 numbers like 32,2,67,175,52,...,168 won't work because log_2(14*14)*20 ~= 153bits > 128bits. This would be equivalent to run-length encoding. We want to do something like this but we are on a very strict budget and cannot afford to be "wasteful" with bits.
Because you care about each possibility equally, your "dictionary"/"program" will simulate a giant lookup table. Matlab's sparse matrix implementation may work but is not guaranteed to work and is thus not a correct solution.
If you can create a bijection between the number range [0,2^128) and subsets of size 20, you're good to go. This corresponds to enumerating ways to descend the pyramid in http://en.wikipedia.org/wiki/Binomial_coefficient to the 20th element of row 196. This is the same as enumerating all "k-combinations". See http://en.wikipedia.org/wiki/Combination#Enumerating_k-combinations
Fortunately I know that Mathematica and Sage and other CAS software can apparently generate the "5th" or "12th" or arbitrarily numbered k-subset. Looking through their documentation, we come upon a function called "rank", e.g. http://www.sagemath.org/doc/reference/sage/combinat/subset.html
So then we do some more searching, and come across some arcane Fortran code like http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_rank.m and http://people.sc.fsu.edu/~jburkardt/m_src/subset/ksub_unrank.m
We could reverse-engineer it, but it's kind of dense. But now we have enough information to search for k-subset rank unrank, which leads us to http://www.site.uottawa.ca/~lucia/courses/5165-09/GenCombObj.pdf -- see the section
"Generating k-subsets (of an n-set): Lexicographical
Ordering" and the rank and unrank algorithms on the next few pages.
In order to achieve the exact theoretically optimal compression, in the case of a uniformly random distribution of 1s, we must thus use this technique to biject our matrices to our output number of range <2^128. It just so happens that combinations have a natural ordering, known as ranking and unranking of combinations. You assign a number to each combination (ranking), and if you know the number you automatically know the combination (unranking). Googling k-subset rank unrank will probably yield other algorithms.
Thus your solution would look like this:
serialize the matrix into a list
e.g. [[0,0,1][0,1,1][1,0,0]] -> [0,0,1,0,1,1,1,0,0]
take the indices of the 1s:
e.g. [0,0,1,0,1,1,1,0,0] -> [3,5,6,7]
1 2 3 4 5 6 7 8 9 a k=4-subset of an n=9 set
take the rank
e.g. compressed = rank([3,5,6,7], n=9)
compressed==412 (or something, I made that up)
you're done!
e.g. 412 -binary-> 110011100 (at most n=9bits, less than 2^n=2^9=512)
to uncompress, unrank it
I'll get to 128 bits in a sec, first here's how you fit a 14x14 boolean matrix with exactly 20 nonzeros into 136 bits. It's based on the CSC sparse matrix format.
You have an array c with 14 4-bit counters that tell you how many nonzeros are in each column.
You have another array r with 20 4-bit row indices.
56 bits (c) + 80 bits (r) = 136 bits.
Let's squeeze 8 bits out of c:
Instead of 4-bit counters, use 2-bit. c is now 2*14 = 28 bits, but can't support more than 3 nonzeros per column. This leaves us with 128-80-28 = 20 bits. Use that space for array a4c with 5 4-bit elements that "add 4 to an element of c" specified by the 4-bit element. So, if a4c={2,2,10,15, 15} that means c[2] += 4; c[2] += 4 (again); c[10] += 4;.
The "most wasteful" distribution of nonzeros is one where the column count will require an add-4 to support 1 extra nonzero: so 5 columns with 4 nonzeros each. Luckily we have exactly 5 add-4s available.
Total space = 28 bits (c) + 20 bits
(a4c) + 80 bits (r) = 128 bits.
Your input is a perfect candidate for a sparse matrix. You said you're using Matlab, so you already have a good sparse matrix built for you.
spm = sparse(dense_matrix)
Matlab's sparse matrix implementation uses Compressed Sparse Columns, which has memory usage on the order of 2*(# of nonzeros) + (# of columns), which should be pretty good in your case of 20 nonzeros and 14 columns. Storing 20 values sure is better than storing 196...
Also remember that all matrices in Matlab are going to be composed of doubles. Just because your matrix can be stored as a 1-bit boolean doesn't mean Matlab won't stick it into a 64-bit floating point value... If you do need it as a boolean you're going to have to make your own type in C and use .mex files to interface with Matlab.
After thinking about this again, if all your matrices are going to be this small and they're all binary, then just store them as a binary vector (bitmask). Going off your 14x14 example, that requires 196 bits or 25 bytes (plus n, m if your dimensions are not constant). That same vector in Matlab would use 64 bits per element, or 1568 bytes. So storing the matrix as a bitmask takes as much space as 4 elements of the original matrix in Matlab, for a compression ratio of 62x.
Unfortunately I don't know if Matlab supports bitmasks natively or if you have to resort to .mex files. If you do get into C++ you can use STL's vector<bool> which implements a bitmask for you.