Nand2Tetris-Obtaining Register from RAM chips - nand2tetris

I've recently completed Chapter 3 of the associated textbook for this course: The Elements of Computing, Second Edition.
While I was able to implement all of the chips described in this chapter, I am still trying to wrap my head around how exactly the RAM chips work. I think I understand them in theory (e.g. a Ram4K chip stores a set of 8 RAM512 chips, which itself is a set of 8 RAM64 chips).
What I am unsure about is actually using the chips. For example, suppose I try to output a single register from RAM16K using this code, given an address:
CHIP RAM16K {
IN in[16], load, address[14];
OUT out[16];
PARTS:
Mux4Way16(a=firstRam, b=secondRam, c=thirdRam, d=fourthRam, sel=address[12..13], out=out);
And(a=load, b=load, out=shouldLoad);
DMux4Way(in=shouldLoad, sel=address[12..13], a=setRamOne, b=setRamTwo, c=setRamThree, d=setRamFour);
RAM4K(in=in, load=setRamOne, address=address[0..11], out=firstRam);
RAM4K(in=in, load=setRamTwo, address=address[0..11], out=secondRam);
RAM4K(in=in, load=setRamThree, address=address[0..11], out=thirdRam);
RAM4K(in=in, load=setRamFour, address=address[0..11], out=fourthRam);
}
How does the above code get the underlying register? If I understand the description of the chip correctly, it is supposed to return a single register. I can see that it outputs a RAM4K based on a series of address bits -- does it also get the base register itself recursively through the chips at the bottom? Why doesn't this code have an error if it's outputting a RAM4K when we expect a register?

It's been a while since I did the course so please excuse any minor errors below.
Each RAM chip (whatever the size) consists of an array of smaller chips. If you are implementing a 16K chip with 4K subchips, then there will be 4 of them.
So you would use 2 bits of the incoming address to select what sub-chip you need to work with, and the remaining 12 bits are sent on to all the sub-chip. It doesn't matter how you divide up the bits, as long as you have a set of 2 and a set of 12.
Specifically, the 2 select bits are used to route the load signal to just one sub-chip (ie: using a DMux4Way), so loads only affect that one sub-chip, and they are also used to pick which of the sub-chips outputs are used (ie: a Mux4Way16).
When I was doing it, I found that the simplest way to do things was always use the least-significant bits as the select bits. So for example, my RAM64 chip used address[0..2] as the select bits, and passed address[3..5] to the RAM8 sub-chips.
The thing that may be confusing you is that in these kinds of circuits, all of the sub-chips are activated. It's just that you use the select bits to decide which sub-chip's output to pass on to the outputs, and also as a filter to decide which sub-chip might perform a load.
As the saying goes, "It's turtles (or ram chips) all the way down."

Related

Convert EM4x02 ID to Hitag2 Value

I've been working on an RFID project to produce our own RFID cards to work on our existing timeclocks and readers.
I've got most of the work done, and have been able to successfully write a Hitag2 card using the value of page 4 & 5 from another card (so basically copying the card) then changing the config bit which makes it act like an EM4x02 which allows our readers to read it.
What I'm struggling with is trying to relate the hex code on page4/5 to the output you get when scanning as an EM4x..
The values of the hitag page 4/5 are FF800000/003EDF10. This translates to 0000001EBC when read as an EM4x.
Does anybody have an idea on how this translation is done? I've tried using the methods in RFIDIOT but that doesn't seem to work for this.
I've managed to find how this is done after finding a hitag2 datasheet from 1999 (the only one I could find that explains the bits when hitag is in public mode A)
Firstly, convert the number you want on the EM4 card to hex.
Convert that hex into binary.
Split the binary into 4 bit chunks, then work out the even parity for each section and add it to the end of each chunk. (So you'll end up with 5 bits per chunk)
Then, work out the even parity of each column in the data (i.e first character of all chunks, then second etc. But ignoring the parity bit you added) and add these 4 bytes to the binary string.
Then add the correct amount of zeros at the start to ensure the data section has 50 bits.
Once you have the data section sorted, add 9 bits of 1 to the beginning (header) and a final 0 to the very end of the binary.
Your whole binary string should be 64 bits long.
Convert this to hex and split it in half. You can then write these onto pages 4/5 of a Hitag2 card.
You then need to change the configuration bit to 0x02 for the tag to work in public mode a.
Just thought I would send you the diagram of how this works.Em4X tag data

Max number of words to jump over in the PC-relative addressing mode

I'm reading a book of David Patterson and John Hennesy titled: Computer Organization and Design. In the RISC-V architecture set which the book is about there are two instruction formats related to jumping - SB-type and UJ-type. The former uses 12-bit constant to represent offset (in bytes) to jump from the current instruction and the latter uses 20-bit constant to represent the same. Then the author says the following:
Since the program counter (PC) contains the address of the current instruction, we can branch (SB-type) within +=2^10 words of the current instruction, or jump (UJ) within += 2^18 words of the current instruction, if we use the PC as the register to be added to the address.
I don't understand how they get those 2^10 and 2^18. Since the constant the instructions use is two's complement, then it can represent values from -2^11 to 2^11 - 1 in the first case and -2^19 to 2^19 - 1 in the second case. Since these constants represent bytes, but we want to know how many words we can jump over, therefore we need to divide max value of bytes by four, so the max which we can get is 2^11 / 2^2 = 2^9 words in the first case and 2^17 in the second one.
Could someone please take a look at my calculations above and point me out to what I'm missing and what's wrong with my calculations and thoughts?
UPDATE:
Probably I didn't understand the author correctly. May it be the case that they mean the lower-bound (-2^10) and upper-bound (+2^10)? So they mean that we can never jump beyond 2^10 from the current instruction?

PIC Assembly: Calling functions with variables

So say I have a variable, which holds a song number. -> song_no
Depending upon the value of this variable, I wish to call a function.
Say I have many different functions:
Fcn1
....
Fcn2
....
Fcn3
So for example,
If song_no = 1, call Fcn1
If song_no = 2, call Fcn2
and so forth...
How would I do this?
you should have compare function in the instruction set (the post suggests you are looking for assembly solution), the result for that is usually set a True bit or set a value in a register. But you need to check the instruction set for that.
the code should look something like:
load(song_no, $R1)
cmpeq($1,R1) //result is in R3
jmpe Fcn1 //jump if equal
cmpeq ($2,R1)
jmpe Fcn2
....
Hope this helps
I'm not well acquainted with the pic, but these sort of things are usually implemented as a jump table. In short, put pointers to the target routines in an array and call/jump to the entry indexed by your song_no. You just need to calculate the address into the array somehow, so it is very efficient. No compares necessary.
To elaborate on Jens' reply the traditional way of doing on 12/14-bit PICs is the same way you would look up constant data from ROM, except instead of returning an number with RETLW you jump forward to the desired routine with GOTO. The actual jump into the jump table is performed by adding the offset to the program counter.
Something along these lines:
movlw high(table)
movwf PCLATH
movf song_no,w
addlw table
btfsc STATUS,C
incf PCLATH
addwf PCL
table:
goto fcn1
goto fcn2
goto fcn3
.
.
.
Unfortunately there are some subtleties here.
The PIC16 only has an eight-bit accumulator while the address space to jump into is 11-bits. Therefore both a directly writable low-byte (PCL) as well as a latched high-byte PCLATH register is available. The value in the latch is applied as MSB once the jump is taken.
The jump table may cross a page, hence the manual carry into PCLATH. Omit the BTFSC/INCF if you know the table will always stay within a 256-instruction page.
The ADDWF instruction will already have been read and be pointing at table when PCL is to be added to. Therefore a 0 offset jumps to the first table entry.
Unlike the PIC18 each GOTO instruction fits in a single 14-bit instruction word and PCL addresses instructions not bytes, so the offset should not be multiplied by two.
All things considered you're probably better off searching for general PIC16 tutorials. Any of these will clearly explain how data/jump tables work, not to mention begin with the basics of how to handle the chip. Frankly it is a particularly convoluted architecture and I would advice staying with the "free" hi-tech C compiler unless you particularly enjoy logic puzzles or desperately need the performance.

Binary Calculator using Picaxe microchip?

I want to be a complete nerd and make a very simple binary calculator.
It will be two rows of 8 switches, each switch representing a bit, so a row is a byte (number), the two rows are added together, and a row of 9 LED's will display the result in binary.
Is this possible to do with a picaxe microchip?
If not, what could I do it with?
Cheers,
Nick
Your problem would be data input/output lines. The basic idea is trivial in any microcontroller, but it's the number of input/output pins available.
You might want to look into several shift registers (one per row and one per output) so you can marshal the bits in on a single pin or two and out on a single pin.
Specifically:
74hc165n parallel-in/serial-out for the inputs
74hc595 for the output.

Creating a logic gate simulator

I need to make an application for creating logic circuits and seeing the results. This is primarily for use in A-Level (UK, 16-18 year olds generally) computing courses.
Ive never made any applications like this, so am not sure on the best design for storing the circuit and evaluating the results (at a resomable speed, say 100Hz on a 1.6Ghz single core computer).
Rather than have the circuit built from the basic gates (and, or, nand, etc) I want to allow these gates to be used to make "chips" which can then be used within other circuits (eg you might want to make a 8bit register chip, or a 16bit adder).
The problem is that the number of gates increases massively with such circuits, such that if the simulation worked on each individual gate it would have 1000's of gates to simulate, so I need to simplify these components that can be placed in a circuit so they can be simulated quickly.
I thought about generating a truth table for each component, then simulation could use a lookup table to find the outputs for a given input. The problem occurred to me though that the size of such tables increase massively with inputs. If a chip had 32 inputs, then the truth table needs 2^32 rows. This uses a massive amount of memory in many cases more than there is to use so isn't practical for non-trivial components, it also wont work with chips that can store their state (eg registers) since they cant be represented as a simply table of inputs and outputs.
I know I could just hardcode things like register chips, however since this is for educational purposes I want it so that people can make their own components as well as view and edit the implementations for standard ones. I considered allowing such components to be created and edited using code (eg dlls or a scripting language), so that an adder for example could be represented as "output = inputA + inputB" however that assumes that the students have done enough programming in the given language to be able to understand and write such plugins to mimic the results of their circuit which is likly to not be the case...
Is there some other way to take a boolean logic circuit and simplify it automatically so that the simulation can determine the outputs of a component quickly?
As for storing the components I was thinking of storing some kind of tree structure, such that each component is evaluated once all components that link to its inputs are evaluated.
eg consider: A.B + C
The simulator would first evaluate the AND gate, and then evaluate the OR gate using the output of the AND gate and C.
However it just occurred to me that in cases where the outputs link back round to the inputs, will cause a deadlock because there inputs will never all be evaluated...How can I overcome this, since the program can only evaluate one gate at a time?
Have you looked at Richard Bowles's simulator?
You're not the first person to want to build their own circuit simulator ;-).
My suggestion is to settle on a minimal set of primitives. When I began mine (which I plan to resume one of these days...) I had two primitives:
Source: zero inputs, one output that's always 1.
Transistor: two inputs A and B, one output that's A and not B.
Obviously I'm misusing the terminology a bit, not to mention neglecting the niceties of electronics. On the second point I recommend abstracting to wires that carry 1s and 0s like I did. I had a lot of fun drawing diagrams of gates and adders from these. When you can assemble them into circuits and draw a box round the set (with inputs and outputs) you can start building bigger things like multipliers.
If you want anything with loops you need to incorporate some kind of delay -- so each component needs to store the state of its outputs. On every cycle you update all the new states from the current states of the upstream components.
Edit Regarding your concerns on scalability, how about defaulting to the first principles method of simulating each component in terms of its state and upstream neighbours, but provide ways of optimising subcircuits:
If you have a subcircuit S with inputs A[m] with m < 8 (say, giving a maximum of 256 rows) and outputs B[n] and no loops, generate the truth table for S and use that. This could be done automatically for identified subcircuits (and reused if the subcircuit appears more than once) or by choice.
If you have a subcircuit with loops, you may still be able to generate a truth table. There are fixed-point finding methods which can help here.
If your subcircuit has delays (and they are significant to the enclosing circuit) the truth table can incorporate state columns. E.g. if the subcircuit has input A, inner state B, and output C, where C <- A and B, B <- A, the truth table could be:
A B | B C
0 0 | 0 0
0 1 | 0 0
1 0 | 1 0
1 1 | 1 1
If you have a subcircuit that the user asserts implements a particular known pattern such as "adder", provide an option for using a hard-coded implementation for updating that subcircuit instead of by simulating its inner parts.
When I made a circuit emulator (sadly, also incomplete and also unreleased), here's how I handled loops:
Each circuit element stores its boolean value
When an element "E0" changes its value, it notifies (via the observer pattern) all who depend on it
Each observing element evaluates its new value and does likewise
When the E0 change occurs, a level-1 list is kept of all elements affected. If an element already appears on this list, it gets remembered in a new level-2 list but doesn't continue to notify its observers. When the sequence which E0 began has stopped notifying new elements, the next queue level is handled. Ie: the sequence is followed and completed for the first element added to level-2, then the next added to level-2, etc. until all of level-x is complete, then you move to level-(x+1)
This is in no way complete. If you ever have multiple oscillators doing infinite loops, then no matter what order you take them in, one could prevent the other from ever getting its turn. My next goal was to alleviate this by limiting steps with clock-based sync'ing instead of cascading combinatorials, but I never got this far in my project.
You might want to take a look at the From Nand To Tetris in 12 steps course software. There is a video talking about it on youtube.
The course page is at: http://www1.idc.ac.il/tecs/
If you can disallow loops (outputs linking back to inputs), then you can significantly simplify the problem. In that case, for every input there will be exactly one definite output. Cycles however can make the output undecideable (or rather, constantly changing).
Evaluating a circuit without loops should be easy - just use the BFS algorithm with "junctions" (connections between logic gates) as the items in the list. Start off with all the inputs to all the gates in an "undefined" state. As soon as a gate has all inputs "defined" (either 1 or 0), calculate its output and add its output junctions to the BFS list. This way you only have to evaluate each gate and each junction once.
If there are loops, the same algorithm can be used, but the circuit can be built in such a way that it never comes to a "rest" and some junctions are always changing between 1 and 0.
OOps, actually, this algorithm can't be used in this case because the looped gates (and gates depending on them) would forever stay as "undefined".
You could introduce them to the concept of Karnaugh maps, which would help them simplify truth values for themselves.
You could hard code all the common ones. Then allow them to build their own out of the hard coded ones (which would include low level gates), which would be evaluated by evaluating each sub-component. Finally, if one of their "chips" has less than X inputs/outputs, you could "optimize" it into a lookup table. Maybe detect how common it is and only do this for the most used Y chips? This way you have a good speed/space tradeoff.
You could always JIT compile the circuits...
As I haven't really thought about it, I'm not really sure what approach I'd take.. but it would possibly be a hybrid method and I'd definitely hard code popular "chips" in too.
When I was playing around making a "digital circuit" simulation environment, I had each defined circuit (a basic gate, a mux, a demux and a couple of other primitives) associated with a transfer function (that is, a function that computes all outputs, based on the present inputs), an "agenda" structure (basically a linked list of "when to activate a specific transfer function), virtual wires and a global clock.
I arbitrarily set the wires to hard-modify the inputs whenever the output changed and the act of changing an input on any circuit to schedule a transfer function to be called after the gate delay. With this at hand, I could accommodate both clocked and unclocked circuit elements (a clocked element is set to have its transfer function run at "next clock transition, plus gate delay", any unclocked element just depends on the gate delay).
Never really got around to build a GUI for it, so I've never released the code.