How does Parity's Aura consensus protocol work? - ethereum

Here it's a very high level description with only formulas. I want to understand actually how it works.
I don't actually understand what a step is and what's it's use? Does a node always keep updating the step? And when time to create to create and broadcast a block comes it will take the current step value and check if he should broadcast or not.
What do you mean by "Blocks from more than 1 step into the future are rejected."? Does this mean that if block time is 5 seconds then the next block timestamp should be exactly 5 seconds higher.
And also what happens when the next primary doesn't broadcast? How does the network deal with it? All the next blocks should get invalidated right because they won't follow a timestamp difference of 5 seconds.

AuRa is the name for Parity's Proof-of-Authority (PoA) consensus engine, the name originally comes from Authority Round (used to be AuRo). It's used in the Kovan network.
PoA networks are permissioned not public by design. Only strictly defined authority nodes are allowed to seal blocks. This is very useful for test networks or enterprise networks where the native tokens on the blockchain are not holding any value and therefore would be easy to attack in a Proof-of-Work (PoW) or Proof-of-Stake (PoS) environment.
A step is one part of the authority round. Each authority can seal one block in each round. Let's say we have five authorities: 0x0a .. 0x0e. These would be the steps, as defined in the chain specification or in the dynamic validator contract:
Step 1: 0x0a seals a block
Step 2: 0x0b seals a block
Step 3: 0x0c seals a block
Step 4: 0x0d seals a block
Step 5: 0x0e seals a block
After the round is finished, it starts over again.
What do you mean by "Blocks from more than 1 step into the future are rejected."?
Now if The node 0x0c would try to seal a block right after 0x0a, then this block would be more than 1 step into the future. The block sealing strickly relies on the block step order of all authorities.
And also what happens when the next primary doesn't broadcast?
That's no problem, there will be a gap between two blocks, i.e., doubled block time. So if 0x0c notices that 0x0b is not providing a block in the specified time window, it can override this step with its own block and the round goes on. There are certain tolerances on the block timestamps to make sure the network does not stall.
In this screenshot above, you can see that two authorities in the Kovan network are not sealing blocks. The result is an increased block time between these steps.
Disclosure: I work for Parity.

Related

Nand2Tetris-Obtaining Register from RAM chips

I've recently completed Chapter 3 of the associated textbook for this course: The Elements of Computing, Second Edition.
While I was able to implement all of the chips described in this chapter, I am still trying to wrap my head around how exactly the RAM chips work. I think I understand them in theory (e.g. a Ram4K chip stores a set of 8 RAM512 chips, which itself is a set of 8 RAM64 chips).
What I am unsure about is actually using the chips. For example, suppose I try to output a single register from RAM16K using this code, given an address:
CHIP RAM16K {
IN in[16], load, address[14];
OUT out[16];
PARTS:
Mux4Way16(a=firstRam, b=secondRam, c=thirdRam, d=fourthRam, sel=address[12..13], out=out);
And(a=load, b=load, out=shouldLoad);
DMux4Way(in=shouldLoad, sel=address[12..13], a=setRamOne, b=setRamTwo, c=setRamThree, d=setRamFour);
RAM4K(in=in, load=setRamOne, address=address[0..11], out=firstRam);
RAM4K(in=in, load=setRamTwo, address=address[0..11], out=secondRam);
RAM4K(in=in, load=setRamThree, address=address[0..11], out=thirdRam);
RAM4K(in=in, load=setRamFour, address=address[0..11], out=fourthRam);
}
How does the above code get the underlying register? If I understand the description of the chip correctly, it is supposed to return a single register. I can see that it outputs a RAM4K based on a series of address bits -- does it also get the base register itself recursively through the chips at the bottom? Why doesn't this code have an error if it's outputting a RAM4K when we expect a register?
It's been a while since I did the course so please excuse any minor errors below.
Each RAM chip (whatever the size) consists of an array of smaller chips. If you are implementing a 16K chip with 4K subchips, then there will be 4 of them.
So you would use 2 bits of the incoming address to select what sub-chip you need to work with, and the remaining 12 bits are sent on to all the sub-chip. It doesn't matter how you divide up the bits, as long as you have a set of 2 and a set of 12.
Specifically, the 2 select bits are used to route the load signal to just one sub-chip (ie: using a DMux4Way), so loads only affect that one sub-chip, and they are also used to pick which of the sub-chips outputs are used (ie: a Mux4Way16).
When I was doing it, I found that the simplest way to do things was always use the least-significant bits as the select bits. So for example, my RAM64 chip used address[0..2] as the select bits, and passed address[3..5] to the RAM8 sub-chips.
The thing that may be confusing you is that in these kinds of circuits, all of the sub-chips are activated. It's just that you use the select bits to decide which sub-chip's output to pass on to the outputs, and also as a filter to decide which sub-chip might perform a load.
As the saying goes, "It's turtles (or ram chips) all the way down."

Looking for a _one byte_ invalid opcode with x86

I need an invalid opcode with x86 (not x64!) that's exactly one byte in length to overwrite some code in a foreign process. Currently I'm using INT3 (0xCC) but it would be nicer to trap an invalid opcode separately since the foreign process contains a lot of valid INT3.
According to http://ref.x86asm.net/coder32.html, there aren't any in 32-bit mode guaranteed to #UD. Anything that wasn't nailed down has been reused as building material for new extensions.
The ones that exist in 64-bit mode are reserved and not guaranteed to fault on future CPUs; only ud2 is truly guaranteed future-proof. Assuming x86-64 lasts long enough, likely some vendor will make use of that 64-bit-only coding space and stop wasting code-size to also cater to increasingly obsolete 32-bit mode.
If you don't need #UD, you can raise #GP(0) with some privileged instructions in user-space, assuming you're never going to be running in kernel mode.
F4 hlt will always #GP(0) in user-space, not enabled by IOPL, only true CPL=0. (Or #UD if used with a lock prefix). Even if it somehow gets executed in a kernel context, it just stops and waits for the next interrupt, so typically no effect on correctness unless executed with interrupts disabled. (In which case you're stuck until the next NMI).
A similar but worse option is FB sti. But it can execute successfully in a program that's used Linux iopl(), like an X11 server. Unless interrupts were supposed to be disabled, though, that's still not going to lock up your system, it just won't trigger the exception you were looking for. (Unlike cli which could get that CPU stuck, or in al, dx which could do wild IO and even be allowed by ioperm not just iopl, depending on what value is in DX.)
Depending what comes next in memory, 9A callf ptr16:32 might fault on trying to load an invalid value into CS. That value would come from the 2 bytes of machine code 5 and 6 bytes after this one (i.e. after a 32-bit new EIP, since ptr16:32 is stored little-endian). Unlike call rel32 or whatever, it may fault before actually pushing anything and overwriting the current CS:EIP. (But if not, in theory your debugger could simulate popping that far-return address back into CS:EIP after catching the fault.)
Just to be clear, I'm suggesting overwriting a byte with 9A, and leaving the later bytes of machine code unmodified, after checking that the bytes that would be the new CS value are in fact invalid. e.g. by making sure a far call to that address segfaults. Or if this is near the end of a page, and the next is unmapped, it can #PF.
The F0 lock prefix faults with #UD if used on things other than a memory-destination RMW operation, so it can also work if later context would decode as any other instruction. But you can't always use it; you need to check that you aren't creating a valid atomic RMW instruction. e.g. if the ModRM byte was 00 or 01, replacing the opcode with a lock prefix creates a memory-destination add.
#ecm points out that f1 on some CPUs is icebp / int1, but on other CPUs where it isn't, it's undefined but doesn't raise #UD. (http://ref.x86asm.net/coder32.html#xF1)
If the following byte is 0, D4 00 aam 0 is guaranteed to #DE (divide exception). But any other value does immediate 8-bit division of AL.
Depending what byte comes next, CD int n can be used. But not for all following bytes, e.g. int 0x80 won't fault under Linux (unless your kernel is built without CONFIG_IA32_EMULATION). And you might not want some of the other random interrupt numbers. e.g. CD 03 int 3 is pretty much like CC int3.

MIPS Pipeline forwarding: How to forward to the second succeeding instruction?

Say, for example, I have 3 instructions: 1, 2, and 3.
I want to forward data from instruction 1 to instruction 3. The catch is, I can only forward from the EX/MEM register of instruction 1.
So we have:
1: IF ID EX MEM WB
2: IF ID EX MEM WB
3: IF ID EX MEM WB
and I want to forward from EX/MEM of 1 to ID/EX of 3.
This is part of a homework problem, and apparently I need to stall an instruction. I don't see how this would help anything in the slightest, since it already makes no sense for me to forward data forward in time.
Problem in question:
Answer:
Thanks for any help
... since it already makes no sense for me to forward data forward in time.
Data can only be forwarded forwards in time, not backwards, since as of 2021, we still don't have time machines.
So, forwarding necessarily feeds information available in the processor generated just right now, to somewhere else in the processor, so it can be used in the future (i.e. the next cycle).
Forwarding and stalling are both ways to mitigate a RAW hazard — the idea is simply to get a value from where generated to where needed.
If the "where needed" is earlier in time than the "where generated", then a stall is required.  However, if the "where needed" is later in time than the "where generated" then a forward can mitigate the hazard.  The hazard is caused by the pipeline's assumption that "where needed" is the register file, which is incorrect in back to back operations.
Some hazards require both a forward and a stall, as the best that can be done.
But all hazards can be mitigated with sufficient stalling and without forwarding, though that will reduce performance.  With sufficient stalling, the "where needed" and the "where generated" can both be the register file.
The catch is, I can only forward from the EX/MEM register of instruction 1.  and I want to forward from EX/MEM of 1 to ID/EX of 3.
We cannot forward from EX/MEM of 1 directly to ID/EX of 3.  Why?  Because, ID/EX of 3 is two cycles further along, so the data we want to forward from there (EX/MEM of 1) is no longer there: it has moved down the pipeline to the next stage.  By the time that ID/EX of 3 wants that data, that data is now in MEM/WB (e.g. of 1) — EX/MEM at that time is doing instruction 2.

Max number of words to jump over in the PC-relative addressing mode

I'm reading a book of David Patterson and John Hennesy titled: Computer Organization and Design. In the RISC-V architecture set which the book is about there are two instruction formats related to jumping - SB-type and UJ-type. The former uses 12-bit constant to represent offset (in bytes) to jump from the current instruction and the latter uses 20-bit constant to represent the same. Then the author says the following:
Since the program counter (PC) contains the address of the current instruction, we can branch (SB-type) within +=2^10 words of the current instruction, or jump (UJ) within += 2^18 words of the current instruction, if we use the PC as the register to be added to the address.
I don't understand how they get those 2^10 and 2^18. Since the constant the instructions use is two's complement, then it can represent values from -2^11 to 2^11 - 1 in the first case and -2^19 to 2^19 - 1 in the second case. Since these constants represent bytes, but we want to know how many words we can jump over, therefore we need to divide max value of bytes by four, so the max which we can get is 2^11 / 2^2 = 2^9 words in the first case and 2^17 in the second one.
Could someone please take a look at my calculations above and point me out to what I'm missing and what's wrong with my calculations and thoughts?
UPDATE:
Probably I didn't understand the author correctly. May it be the case that they mean the lower-bound (-2^10) and upper-bound (+2^10)? So they mean that we can never jump beyond 2^10 from the current instruction?

My simple turing machine

I'm trying to understand and implement the simplest turing machine and would like feedback if I'm making sense.
We have a infinite tape (lets say an array called T with pointer at 0 at the start) and instruction table:
( S , R , W , D , N )
S->STEP (Start at step 1)
R->READ (0 or 1)
W->WRITE (0 or 1)
D->DIRECTION (0=LEFT 1=RIGHT)
N->NEXTSTEP (Non existing step is HALT)
My understanding is that a 3-state 2-symbol is the simplest machine. 3-state i don't understand. 2-symbol because we use 0 and 1 for READ/WRITE.
For example:
(1,0,1,1,2)
(1,1,0,1,2)
Starting at step 1, if Read is 0 then { Write 1, Move Right) else {Write 0, Move Right) and then go to step 2 - which does not exist which halts program.
What does 3-state mean? Does this machine pass as turing machine? Can we simplify more?
I think the confusion might come from your use of "steps" instead of "states". You can think of a machine's state as the value it has in its memory (although as a previous poster noted, some people also take a machine's state to include the contents of the tape -- however, I don't think that definition is relevant to your question). It's possible that this change in terminology might be at the heart of your confusion. Let me explain what I think it is you're thinking. :)
You gave lists of five numbers -- for example, (1,0,1,1,2). As you correctly state, this should be interpreted (reading from left to right) as "If the machine is in state 1 AND the current square contains a 0, print a 1, move right, and change to state 2." However, your use of the word "step" seems to suggest that that "step 2" must be followed by "step 3", etc., when in reality a Turing machine can go back and forth between states (and of course, there can only be finitely many possible states).
So to answer your questions:
Turing machines keep track of "states" not "steps";
What you've described is a legitimate Turing machine;
A simpler (albeit otherwise uninteresting) Turing machine would be one that starts in the HALT state.
Edits: Grammar, Formatting, and removed a needless description of Turing machines.
Response to comment:
Correct me if I'm misinterpreting your comment, but I did not mean to suggest a Turing machine could be in more than one state at a time, only that the number of possible states can be any finite number. For example, for a 3-state machine, you might label the possible states A, B, and C. (In the example you provided, you labeled the two possible states as '1' and '2') At any given time, exactly one of those values (states) would be in the machine's memory. We would say, "the machine is in state A" or "the machine is in state B", etc. (Your machine starts in state '1' and terminates after it enters state '2').
Also, it's no longer clear to me what you mean by a "simpler/est" machine. The smallest known Universal Turing machine (i.e., a Turing machine that can simulate another Turing machine, given an appropriate tape) requires 2 states and 5 symbols (see the relevant Wikipedia article).
On the other hand, if you're looking for something simpler than a Turing machine with the same computation power, Post-Turing machines might be of interest.
I believe that the concept of state is basically the same as in Finite State Machines. If I recall, you need a separate termination state, to which the turing machine can transition after it has finished running the program. As for why 3 states I'd guess that the other two states are for intialisation and execution respectively.
Unfortunately none of that is guaranteed to be correct, but I thought I'd post my thoughts anyway since the question was unanswered for 5 hours. I suspect if you were to re-ask this question on cstheory.stackexchange.com you might get a better/more definative answer.
"State" in the context of Turing machines should be clarified as to which is being described: (i) the current instruction, or (ii) the list of symbols on the tape together with the current instruction, or (iii) the list of symbols on the tape together with the current instruction placed to the left of the scanned symbol or to the right of the scanned symbol. Reference