I just read a lot about how processors work and how everything is just about 0 and 1 but I have a small question.
Suppose the processor got the following input "01100001" how could he know that it's 'a' letter and not the number 97? I don't understand this point and didn't find an answer for it as long as I searched.
Suppose the processor got the following input "01100001" how could he
know that it's 'a' letter and not the number 97?
Well, generally speaking, the processor doesn't need to know that information, and it is impossible to know how it is going to interpret that input without knowing the architecture and the associated assembly instruction.
I don't understand this point and didn't find an answer for it as long
as I searched.
I think the thing you are missing in your understanding is that the processor is at the lowest layer of abstraction, which is the hardware level. The processor interacts with memory which is where your example number would reside. What is done with that memory is up to the software. It is also up to the software to determine how to interpret that number when that memory location is read. If you are wondering how a number like that would be printed by a processor, the answer is that it wouldn't. There would be some sort of peripheral that would be responsible for doing that that the processor is interfacing with.
I encourage you to read more about CPU's
Related
I'm curious as to why we are not allowed to use registers as offsets in MIPS. I know that you can't use registers as offsets like this: lw $t3, $t1($t4); I'm just curious as to why that is the case.
Is it a hardware restriction? Or simply just part of the ISA?
PS: if you're looking for what to do instead, see Load Word in MIPS, using register instead of immediate offset from another register or look at compiler output for a C function like int foo(int *arr, int idx){ return arr[idx]; } - https://godbolt.org/z/PhxG57ox1
I'm curious as to why we are not allowed to use registers as offsets in MIPS.
I'm not sure if you mean "why does MIPS assembly not permit you to write it this form" or "why does the underlying ISA not offer this form".
If it's the former, then the answer is that the base ISA doesn't have any machine instructions that offers that functionality, and apparently the designers didn't decide to offer any pseudo-instruction that would implement that behind the scenes.2
If you're asking why the ISA doesn't offer it in the first place, it's just a design choice. By offering fewer or simpler addressing modes, you get the following advantages:
Less room is needed to encode a more limited set of possibilities, so you save encoding space for more opcodes, shorter instructions, etc.
The hardware can be simpler, or faster. For example, allowing two registers in address calculation may result in:
The need for an additional read port in the register file1.
Additional connections between the register file and the AGU to get both registers values there.
The need to do a full width (32 or 64 bit) addition rather than a simpler address-side + 16 bit-addition for the offset.
The need to have a three-input ALU if you want to still want to support immediate offsets with the 2-register addresses (and they are less useful if you don't).
Additional complexity in instruction decoding and address-generation since you may need to support two quite different paths for address generation.
Of course, all of those trade-offs may very well pay off in some contexts that could make good use of 2-reg addressing with smaller or faster code, but the original design which was heavily inspired by the RISC philosophy didn't include it. As Peter points out in the comments, new addressing modes have been subsequently added for some cases, although apparently not a general 2-reg addressing mode for load or store.
Is it a hardware restriction? Or simply just part of the ISA?
There's a bit of a false dichotomy there. Certainly it's not a hardware restriction in the sense that hardware could certainly support this, even when MIPS was designed. It sort of seems to imply that some existing hardware had that restriction and so the MIPS ISA somehow inherited it. I would suspect it was much the other way around: the ISA was defined this way, based on analysis of how likely hardware would be implemented, and then it became a hardware simplification since MIPS hardware doesn't need to support anything outside of what's in the MIPS ISA.
1 E.g., to support store instructions which would need to read from 3 registers.
2 It's certainly worth asking whether such a pseudo-instruction is a good idea or not: it would probably expand to an add of the two registers to a temporary register and then a lw with the result. There is always a danger that this hides "too much" work. Since this partly glosses over the difference between a true load that maps 1:1 to a hardware load, and the version that is doing extra arithmetic behind the covers, it is easy to imagine it might lead to sup-optimal decisions.
Take the classic example of linearly accessing two arrays of equal element size in a loop. With 2-reg addressing, it is natural to write this loop as two 2-reg accesses (each with a different base register and a common offset register). The only "overhead" for the offset maintenance is the single offset increment. This hides the fact that internally there are two hidden adds required to support the addressing mode: it would have simply been better to increment each base directly and not use the offset. Furthermore, once the overhead is clear, you can see that unrolling the loop and using immediate offsets can further reduce the overhead.
I have found the keras-rl/examples/cem_cartpole.py example and I would like to understand, but I don't find documentation.
What does the line
memory = EpisodeParameterMemory(limit=1000, window_length=1)
do? What is the limit and what is the window_length? Which effect does increasing either / both parameters have?
EpisodeParameterMemory is a special class that is used for CEM. In essence it stores the parameters of a policy network that were used for an entire episode (hence the name).
Regarding your questions: The limit parameter simply specifies how many entries the memory can hold. After exceeding this limit, older entries will be replaced by newer ones.
The second parameter is not used in this specific type of memory (CEM is somewhat of an edge case in Keras-RL and mostly there as a simple baseline). Typically, however, the window_length parameter controls how many observations are concatenated to form a "state". This may be necessary if the environment is not fully observable (think of it as transforming a POMDP into an MDP, or at least approximately). DQN on Atari uses this since a single frame is clearly not enough to infer the velocity of a ball with a FF network, for example.
Generally, I recommend reading the relevant paper (again, CEM is somewhat of an exception). It should then become relatively clear what each parameter means. I agree that Keras-RL desperately needs documentation but I don't have time to work on it right now, unfortunately. Contributions to improve the situation are of course always welcome ;).
A little late to the party, but I feel like the answer doesn't really answer the question.
I found this description online (https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html#replay-memory):
We’ll be using experience replay
memory for training our DQN. It stores the transitions that the agent
observes, allowing us to reuse this data later. By sampling from it
randomly, the transitions that build up a batch are decorrelated. It
has been shown that this greatly stabilizes and improves the DQN
training procedure.
Basically you observe and save all of your state transitions so that you can train your network on them later on (instead of having to make observations from the environment all the time).
I am currently reading / learning Erlang, and it is often noted that it is not (really) suitable for 'heavy number crunching'. Now I often come across this phrase or similar, but never really know what 'heavy' exactly means.
How does one decide if an operation is computationally intensive? Can it be quantified before testing?
Edit:
is there a difference between the quantity of calculations, the complexity of the algorithm or the size of the input values.
for example 1000 computaions of 28303 / 4 vs 100 computations of 239847982628763482 / 238742
When you are talking about Erlang specifically, I doubt that you in general want to develop applications that require intensive number crunching with it. That is - you don't learn Erlang to code a physics engine in it. So don't worry about Erlang being too slow for you.
Moving from Erlang to the question in general, these things almost always come down to relativity. Let's ignore number crunching and ask a general question about programming: How fast is fast enough?
Well, fast enough depends on:
what you want to do with the application
how often you want to do it
how fast your users expect it to happen
If reading a file in some program takes 1ms or 1000ms - is 1000 ms to be considered "too slow"?
If ten files have to be read in quick succession - yes, probably way too slow. Imagine an XML parser that takes 1 second to simply read an XML file from disk - horrible!
If a file on the other hand only has to be read when a user manually clicks a button every 15 minutes or so then it's not a problem, e.g. in Microsoft Word.
The reason nobody says exactly what too slow is, is because it doesn't really matter. The same goes for your specific question. A language should rarely, if ever, be shunned for being "slow".
And last but not least, if you develop some monstrous project in Erlang and, down the road, realise that dagnabbit! you really need to crunch those numbers - then you do your research, find good libraries and implement algorithms in the language best suited for it, and then interop with that small library.
With this sort of thing you'll know it when you see it! Usually this refers to situations when it matters if you pick an int, float, double etc. Things like physical simulations or monte carlo methods, where you want to do millions of calculations.
To be honest, in reality you just write those bits in C and use your favourite other language to run them.
i once asked a question about number crunching in couch DB mapreduce: CouchDB Views: How much processing is acceptable in map reduce?
whats interesting in one of the answers is this:
suppose you had 10,000 documents and they take 1 second each to
process (which is way higher than I have ever seen). That is 10,000
seconds or 2.8 hours to completely build the view. However once the
view is complete, querying any row (?key=...) or row slice
(?startkey=...&endkey=...) takes the same time as querying for
documents directly. Lookup time is O(log n) for the document count.
In other words, even if it takes 1 second per document to execute the
map, it will take a few milliseconds to fetch the result. (Of course,
the view must build first, since it is actually an index.)
I think if you think about your current question in those terms, its an interesting angle to think of your question. on the topic of the language's speed / optimization:
How does one decide if an operation is computationally intensive?
Facebook asked this question about PHP, and ended up writing HIP HOP to solve the problem -- it compiles PHP into C++. They said the reason php is much slower than C++ is because the PHP language is all dynamic lookup, and therefore much processing is required to do anything with variables, arrays, dynamic typing (which is a source of slowdown), etc.
So, a question you can ask is: is erlang dynamic-lookup? static typing? compiled?
is there a difference between the quantity of calculations, the
complexity of the algorithm or the size of the input values. For
example 1000 computaions of 28303 / 4 vs 100 computations of
239847982628763482 / 238742
So, with that said, the fact that you can even grant specific types to numbers of different kinds means you SHOULD be using the right types, and that will definitely cause performance increase.
suitability for number-crunching depends on the library support and inherent nature of the language. for example, a pure functional language will not allow any mutable variables, which makes it extremely interesting to implement any equation solving type problems. Erlang probably falls in to this category.
I've been looking for the answer on google and can't seem to find it. But binary is represented in bytes/octets, 8 bits. So the character a (I think) is 01100010, and the word hey is
01101000
01100101
01111001
So my question is, why 8? Is this just a good number for the computer to work with? And I've noticed how 32 bit/ 62 bit computers are all multiples of eight... so does this all have to do with how the first computers were made?
Sorry if this question doesn't meet the Q/A standards... its not code related but I can't think of anywhere else to ask it.
The answer is really "historical reasons".
Computer memory must be addressable at some level. When you ask your RAM for information, you need to specify which information you want - and it will return that to you. In theory, one could produce bit-addressable memory: you ask for one bit, you get one bit back.
But that wouldn't be very efficient, since the interface connecting the processor to the memory needs to be able to convey enough information to specify which address it wants. The smaller the granularity of access, the more wires you need (or the more pushes along the same number of wires) before you've given an accurate enough address for retrieval. Also, returning one bit multiple times is less efficient than returning multiple bits one time (side note: true in general. This is a serial-vs-parallel debate, and due to reduced system complexity and physics, serial interfaces can generally run faster. But overall, more bits at once is more efficient).
Secondly, the total amount of memory in the system is limited in part by the size of the smallest addressable block, since unless you used variably-sized memory addresses, you only have a finite number of addresses to work with - but each address represents a number of bits which you get to choose. So a system with logically byte-addressable memory can hold eight times the RAM of one with logically bit-addressable memory.
So, we use memory logically addressable at less fine levels (although physically no RAM chip will return just one byte). Only powers of two really make sense for this, and historically the level of access has been a byte. It could just as easily be a nibble or a two-byte word, and in fact older systems did have smaller chunks than eight bits.
Now, of course, modern processors mostly eat memory in cache-line-sized increments, but our means of expressing groupings and dividing the now-virtual address space remained, and the smallest amount of memory which a CPU instruction can access directly is still an eight-bit chunk. The machine code for the CPU instructions (and/or the paths going into the processor) would have to grow the same way the number of wires connecting to the memory controller would in order for the registers to be addressable - it's the same problem as with the system memory accessibility I was talking about earlier.
"In the early 1960s, AT&T introduced digital telephony first on long-distance trunk lines. These used the 8-bit µ-law encoding. This large investment promised to reduce transmission costs for 8-bit data. The use of 8-bit codes for digital telephony also caused 8-bit data octets to be adopted as the basic data unit of the early Internet"
http://en.wikipedia.org/wiki/Byte
Not sure how true that is. It seems that that's just the symbol and style adopted by the IEEE, though.
One reason why we use 8-bit bytes is because the complexity of the world around us has a definitive structure. On the scale of human beings, observed physical world has finite number of distinctive states and patterns. Our innate restricted abilities to classify information, to distinguish order from chaos, finite amount of memory in our brains - these all are the reasons why we choose [2^8...2^64] states to be enough to satisfy our everyday basic computational needs.
I read in Sebesta book, that the compiler spends most of its time in lexing source code. So, optimizing the lexer is a necessity, unlike the syntax analyzer.
If this is true, why lexical analysis stage takes so much time compared to syntax analysis in general ?
I mean by syntax analysis the the derivation process.
First, I don't think it actually is true: in many compilers, most time is not spend in lexing source code. For example, in C++ compilers (e.g. g++), most time is spend in semantic analysis, in particular in overload resolution (trying to find out what implicit template instantiations to perform). Also, in C and C++, most time is often spend in optimization (creating graph representations of individual functions or the whole translation unit, and then running long algorithms on these graphs).
When comparing lexical and syntactical analysis, it may indeed be the case that lexical analysis is more expensive. This is because both use state machines, i.e. there is a fixed number of actions per element, but the number of elements is much larger in lexical analysis (characters) than in syntactical analysis (tokens).
Lexical analysis is the process whereby all the characters in the source code are converted to tokens. For instance
foreach (x in o)
is read character by character - "f", "o", etc.
The lexical analyser must determine the keywords being seen ("foreach", not "for" and so on.)
By the time syntactic analysis occurs the program code is "just" a series of tokens. That said, I agree with the answer above that lexical analysis is not necessarily the most time-consuming process, just that it has the biggest stream to work with.
It depends really where you draw the line between lexing and parsing. I tend to have a very limited view of what a token is, and as a result my parsers spend a lot more time on parsing than on lexing, not because they are faster, but because they simply do less.
It certainly used to be the case that lexing was expensive. Part of that had to do with limited memory and doing multiple file operations to read in bits of program. Now that memory is measured in GB this is no longer an issue and for the same reason a lot more work can be done, so optimization is more important. Of course, whether the optimization helps much is another question.