Brainfuck with 1bit memory cells? - binary

Would an implementation of the programming language Brainfuck, still be turing complete if its memory cells were 1bit in capacity, instead of the usual 8bit?
The + and - instructions become identical, however this need not be a problem.
I see no issue with, for example 4bit memory cells, but I cannot work out if this scales all the way to single bit values.

Yes, the resulting language would still be Turing-complete. In fact, several such languages exist. One of them is Boolfuck. It does exactly what you suggest: have each cell be a single bit and get rid of -, because it's redundant. It also uses ; instead . for output. The official website contains a reduction from Brainfuck to Boolfuck which proves Boolfuck's Turing-completeness. I'm reiterating the reduction here to make the answer self-contained:
Brain. Bool.
+ >[>]+<[+<]>>>>>>>>>[+]<<<<<<<<<
- >>>>>>>>>+<<<<<<<<+[>+]<[<]>>>>>>>>>[+]<<<<<<<<<
< <<<<<<<<<
> >>>>>>>>>
, >,>,>,>,>,>,>,>,<<<<<<<<
. >;>;>;>;>;>;>;>;<<<<<<<<
[ >>>>>>>>>+<<<<<<<<+[>+]<[<]>>>>>>>>>[+<<<<<<<<[>]+<[+<]
] >>>>>>>>>+<<<<<<<<+[>+]<[<]>>>>>>>>>]<[+<]
Other bit-based Brainfuck-derivatives include Smallfuck and BitChanger. This article may also be of interest to you, which goes through several steps of minimising the Brainfuck language by removing redundancy (including using bits instead of bytes).

Related

Computing powers of -1

Is there an established idiom for implementing (-1)^n * a?
The obvious choice of pow(-1,n) * a seems wasteful, and (1-2*(n%2)) * a is ugly and not perfectly efficient either (two multiplications and one addition instead of just setting the sign). I think I will go with n%2 ? -a : a for now, but introducing a conditional seems a bit dubious as well.
Making certain assumptions about your programming language, compiler, and CPU...
To repeat the conventional -- and correct -- wisdom, do not even think about optimizing this sort of thing unless your profiling tool says it is a bottleneck. If so, n % 2 ? -a : a will likely generate very efficient code; namely one AND, one test against zero, one negation, and one conditional move, with the AND+test and negation independent so they can potentially execute simultaneously.
Another option looks something like this:
zero_or_minus_one = (n << 31) >> 31;
return (a ^ zero_or_minus_one) - zero_or_minus_one;
This assumes 32-bit integers, arithmetic right shift, defined behavior on integer overflow, twos-complement representation, etc. It will likely compile into four instructions as well (left shift, right shift, XOR, and subtract), with a dependency between each... But it can be better for certain instruction sets; e.g., if you are vectorizing code using SSE instructions.
Incidentally, your question will get a lot more views -- and probably more useful answers -- if you tag it with a specific language.
As others have written, in most cases, readability is more important than performance and compilers, interpreters and libraries are better at optimizing than most people think. Therfore pow(-1,n) * a is likely to be an efficient solution on your platform.
If you really have a performance issue, your own suggestion n%2 ? -a : a is fine. I don't see a reason to worry about the conditional assignment.
If your language has a bitwise AND operator, you could also use n & 1 ? -a : a which should be very efficient even without any optimization. It is likely that on many platforms, this is what pow(a,b) actually does in the special case of a == -1 and b being an integer.

What is an Abstract Syntax Tree/Is it needed?

I've been interested in compiler/interpreter design/implementation for as long as I've been programming (only 5 years now) and it's always seemed like the "magic" behind the scenes that nobody really talks about (I know of at least 2 forums for operating system development, but I don't know of any community for compiler/interpreter/language development). Anyways, recently I've decided to start working on my own, in hopes to expand my knowledge of programming as a whole (and hey, it's pretty fun :). So, based off the limited amount of reading material I have, and Wikipedia, I've developed this concept of the components for a compiler/interpreter:
Source code -> Lexical Analysis -> Abstract Syntax Tree -> Syntactic Analysis -> Semantic Analysis -> Code Generation -> Executable Code.
(I know there's more to code generation and executable code, but I haven't gotten that far yet :)
And with that knowledge, I've created a very basic lexer (in Java) to take input from a source file, and output the tokens into another file. A sample input/output would look like this:
Input:
int a := 2
if(a = 3) then
print "Yay!"
endif
Output (from lexer):
INTEGER
A
ASSIGN
2
IF
L_PAR
A
COMP
3
R_PAR
THEN
PRINT
YAY!
ENDIF
Personally, I think it would be really easy to go from there to syntactic/semantic analysis, and possibly even code generation, which leads me to question: Why use an AST, when it seems that my lexer is doing just as good a job? However, 100% of my sources I use to research this topic all seem adamant that this is a necessary part of any compiler/interpreter. Am I missing the point of what an AST really is (a tree that shows the logical flow of a program)?
TL;DR: Currently in route to develop a compiler, finished the lexer, seems to me like the output would make for easy syntactic analysis/semantic analysis, rather than doing an AST. So why use one? Am I missing the point of one?
Thanks!
First off, one thing about your list of components does not make sense. Building an AST is (pretty much) the syntactic analysis, so it either shouldn't be in there, or at least come before the AST.
What you got there is a lexer. All it gives you are individual tokens. In any case, you will need an actual parser, because regular languages aren't any fun to program in. You can't even (properly) nest expressions. Heck, you can't even handle operator precedence. A token stream doesn't give you:
An idea where statements and expressions start and end.
An idea how statements are grouped into blocks.
An idea Which part of the expression has which precedence, associativity, etc.
A clear, uncluttered view at the actual structure of the program.
A structure which can be passed through a myriad of transformations, without every single pass knowing and having code to accomodate that the condition in an if is enclosed by parentheses.
... more generally, any kind of comprehension above the level of a single token.
Suppose you have two passes in your compiler which optimize certain kinds of operators applies to certain arguments (say, constant folding and algebraic simplifications like x - x -> 0). If you hand them tokens for the expression x - x * 1, these passes are cluttered with figuring out that the x * 1 part comes first. And they have to know that, lest the transformation is incorrect (consider 1 + 2 * 3).
These things are tricky enough to get right as it is, so you don't want to be pestered by parsing problems as well. That's why you solve the parsing problem first, in a separate parsing step. Then you can, say, replace a function call with its definition, without worrying about adding parenthesis so the meaning remains the same. You save time, you separate concerns, you avoid repetition, you enable simpler code in many other places, etc.
A parser figures all that out, and builds an AST which consequently holds all that information. Without any further data on the nodes, the shape of the AST alone gives you no. 1, 2, 3, and much more, for free. None of the bazillion passes that follow have to worry about it anymore.
That's not to say you always have to have an AST. For sufficiently simple languages, you can do a single-pass compiler. Instead of generating an AST or some other intermediate representation during parsing, you emit code as you go. However, this becomes harder for less simple languages and you can't reasonably do a lot of stuff (such as 70% of all optimizations and diagnostics -- and yes I just made that number up). Generally, I wouldn't advise you to do this. There are good reasons single-pass compilers are mostly dead. Even languages which permit them (e.g. C) are nowadays implemented with multiple passes and ASTs. It's a simple way to get started, but will severely limit you (and the language, if you design it) later.
You've got the AST at the wrong point in your flow diagram. Typically, the output of the lexer is a series of tokens (as you have in your output), and these are fed to the parser/syntactic analyzer, which generates the AST. So the output of your lexer is different from an AST because they are used at different points in the compilation process and fulfill different purposes.
The next logical question is: What, then, is an AST? Well, the purpose of parsing/syntactic analysis is to turn the series of tokens generated by the lexer into an AST (or parse tree). The AST is an intermediate representation that captures the relationship between syntactical elements in a way that is easier to work with programmatically. One way of thinking about this is that a text program is a one dimensional construct, and can only represent ideas as a sequence of elements, while the AST is freed from this constraint, and can represent the underlying relationships between those elements in 2 dimensions (as typically drawn), or any higher dimension space if you so choose to think about it that way.
For instance, a binary operator has two operands, let's call them A and B. In code, this may be spelled 'A * B' (assuming an infix operator - another advantage of an AST is to hide such distinctions that may be important syntactically, but not semantically), but for the compiler to "understand" this expression, it must read 5 characters sequentially, and this logic can quickly become cumbersome, given the many possibilities in even a small language. In an AST representation, however, we have a "binary operator" node whose value is '*', and that node has two children, values 'A' and 'B'.
As your compiler project progresses, I think you will begin to see the advantages of this representation.

What are "magic numbers" in computer programming?

When people talk about the use of "magic numbers" in computer programming, what do they mean?
Magic numbers are any number in your code that isn't immediately obvious to someone with very little knowledge.
For example, the following piece of code:
sz = sz + 729;
has a magic number in it and would be far better written as:
sz = sz + CAPACITY_INCREMENT;
Some extreme views state that you should never have any numbers in your code except -1, 0 and 1 but I prefer a somewhat less dogmatic view since I would instantly recognise 24, 1440, 86400, 3.1415, 2.71828 and 1.414 - it all depends on your knowledge.
However, even though I know there are 1440 minutes in a day, I would probably still use a MINS_PER_DAY identifier since it makes searching for them that much easier. Whose to say that the capacity increment mentioned above wouldn't also be 1440 and you end up changing the wrong value? This is especially true for the low numbers: the chance of dual use of 37197 is relatively low, the chance of using 5 for multiple things is pretty high.
Use of an identifier means that you wouldn't have to go through all your 700 source files and change 729 to 730 when the capacity increment changed. You could just change the one line:
#define CAPACITY_INCREMENT 729
to:
#define CAPACITY_INCREMENT 730
and recompile the lot.
Contrast this with magic constants which are the result of naive people thinking that just because they remove the actual numbers from their code, they can change:
x = x + 4;
to:
#define FOUR 4
x = x + FOUR;
That adds absolutely zero extra information to your code and is a total waste of time.
"magic numbers" are numbers that appear in statements like
if days == 365
Assuming you didn't know there were 365 days in a year, you'd find this statement meaningless. Thus, it's good practice to assign all "magic" numbers (numbers that have some kind of significance in your program) to a constant,
DAYS_IN_A_YEAR = 365
And from then on, compare to that instead. It's easier to read, and if the earth ever gets knocked out of alignment, and we gain an extra day... you can easily change it (other numbers might be more likely to change).
There's more than one meaning. The one given by most answers already (an arbitrary unnamed number) is a very common one, and the only thing I'll say about that is that some people go to the extreme of defining...
#define ZERO 0
#define ONE 1
If you do this, I will hunt you down and show no mercy.
Another kind of magic number, though, is used in file formats. It's just a value included as typically the first thing in the file which helps identify the file format, the version of the file format and/or the endian-ness of the particular file.
For example, you might have a magic number of 0x12345678. If you see that magic number, it's a fair guess you're seeing a file of the correct format. If you see, on the other hand, 0x78563412, it's a fair guess that you're seeing an endian-swapped version of the same file format.
The term "magic number" gets abused a bit, though, referring to almost anything that identifies a file format - including quite long ASCII strings in the header.
http://en.wikipedia.org/wiki/File_format#Magic_number
Wikipedia is your friend (Magic Number article)
Most of the answers so far have described a magic number as a constant that isn't self describing. Being a little bit of an "old-school" programmer myself, back in the day we described magic numbers as being any constant that is being assigned some special purpose that influences the behaviour of the code. For example, the number 999999 or MAX_INT or something else completely arbitrary.
The big problem with magic numbers is that their purpose can easily be forgotten, or the value used in another perfectly reasonable context.
As a crude and terribly contrived example:
while (int i != 99999)
{
DoSomeCleverCalculationBasedOnTheValueOf(i);
if (escapeConditionReached)
{
i = 99999;
}
}
The fact that a constant is used or not named isn't really the issue. In the case of my awful example, the value influences behaviour, but what if we need to change the value of "i" while looping?
Clearly in the example above, you don't NEED a magic number to exit the loop. You could replace it with a break statement, and that is the real issue with magic numbers, that they are a lazy approach to coding, and without fail can always be replaced by something less prone to either failure, or to losing meaning over time.
Anything that doesn't have a readily apparent meaning to anyone but the application itself.
if (foo == 3) {
// do something
} else if (foo == 4) {
// delete all users
}
Magic numbers are special value of certain variables which causes the program to behave in an special manner.
For example, a communication library might take a Timeout parameter and it can define the magic number "-1" for indicating infinite timeout.
The term magic number is usually used to describe some numeric constant in code. The number appears without any further description and thus its meaning is esoteric.
The use of magic numbers can be avoided by using named constants.
Using numbers in calculations other than 0 or 1 that aren't defined by some identifier or variable (which not only makes the number easy to change in several places by changing it in one place, but also makes it clear to the reader what the number is for).
In simple and true words, a magic number is a three-digit number, whose sum of the squares of the first two digits is equal to the third one.
Ex-202,
as, 2*2 + 0*0 = 2*2.
Now, WAP in java to accept an integer and print whether is a magic number or not.
It may seem a bit banal, but there IS at least one real magic number in every programming language.
0
I argue that it is THE magic wand to rule them all in virtually every programmer's quiver of magic wands.
FALSE is inevitably 0
TRUE is not(FALSE), but not necessarily 1! Could be -1 (0xFFFF)
NULL is inevitably 0 (the pointer)
And most compilers allow it unless their typechecking is utterly rabid.
0 is the base index of array elements, except in languages that are so antiquated that the base index is '1'. One can then conveniently code for(i = 0; i < 32; i++), and expect that 'i' will start at the base (0), and increment to, and stop at 32-1... the 32nd member of an array, or whatever.
0 is the end of many programming language strings. The "stop here" value.
0 is likewise built into the X86 instructions to 'move strings efficiently'. Saves many microseconds.
0 is often used by programmers to indicate that "nothing went wrong" in a routine's execution. It is the "not-an-exception" code value. One can use it to indicate the lack of thrown exceptions.
Zero is the answer most often given by programmers to the amount of work it would take to do something completely trivial, like change the color of the active cell to purple instead of bright pink. "Zero, man, just like zero!"
0 is the count of bugs in a program that we aspire to achieve. 0 exceptions unaccounted for, 0 loops unterminated, 0 recursion pathways that cannot be actually taken. 0 is the asymptote that we're trying to achieve in programming labor, girlfriend (or boyfriend) "issues", lousy restaurant experiences and general idiosyncracies of one's car.
Yes, 0 is a magic number indeed. FAR more magic than any other value. Nothing ... ahem, comes close.
rlynch#datalyser.com

One line functions in C?

What do you think about one line functions? Is it bad?
One advantage I can think of is that it makes the code more comprehensive (if you choose a good name for it). For example:
void addint(Set *S, int n)
{
(*S)[n/CHAR_SIZE] |= (unsigned char) pow(2, (CHAR_SIZE - 1) - (n % CHAR_SIZE));
}
One disadvantage I can think of is that it slows the code (pushing parameters to stack, jumping to a function, popping the parameters, doing the operation, jumping back to the code - and only for one line?)
is it better to put such lines in functions or to just put them in the code? Even if we use them only once?
BTW, I haven't found any question about that, so forgive me if such question had been asked before.
Don't be scared of 1-line functions!
A lot of programmers seem to have a mental block about 1-line functions, you shouldn't.
If it makes the code clearer and cleaner, extract the line into a function.
Performance probably won't be affected.
Any decent compiler made in the last decade (and perhaps further) will automatically inline a simple 1-line function. Also, 1-line of C can easily correspond to many lines of machine code. You shouldn't assume that even in the theoretical case where you incur the full overhead of a function call that this overhead is significant compared to your "one little line". Let alone significant to the overall performance of your application.
Abstraction Leads to Better Design. (Even for single lines of code)
Functions are the primary building blocks of abstract, componentized code, they should not be neglected. If encapsulating a single line of code behind a function call makes the code more readable, do it. Even in the case where the function is called once. If you find it important to comment one particular line of code, that's a good code smell that it might be helpful to move the code into a well-named function.
Sure, that code may be 1-line today, but how many different ways of performing the same function are there? Encapsulating code inside a function can make it easier to see all the design options available to you. Maybe your 1-line of code expands into a call to a webservice, maybe it becomes a database query, maybe it becomes configurable (using the strategy pattern, for example), maybe you want to switch to caching the value computed by your 1-line. All of these options are easier to implement and more readily thought of when you've extracted your 1-line of code into its own function.
Maybe Your 1-Line Should Be More Lines.
If you have a big block of code it can be tempting to cram a lot of functionality onto a single line, just to save on screen real estate. When you migrate this code to a function, you reduce these pressures, which might make you more inclined to expand your complex 1-liner into more straightforward code taking up several lines (which would likely improve its readability and maintainability).
I am not a fan of having all sort of logic and functionality banged into one line. The example you have shown is a mess and could be broken down into several lines, using meaningful variable names and performing one operation after another.
I strongly recommend, in every question of this kind, to have a look (buy it, borrow it, (don't) download it (for free)) at this book: Robert C. Martin - Clean Code. It is a book every developer should have a look at.
It will not make you a good coder right away and it will not stop you from writing ugly code in the future, it will however make you realise it when you are writing ugly code. It will force you to look at your code with a more critical eye and to make your code readable like a newspaper story.
If used more than once, definitely make it a function, and let the compiler do the inlining (possibly adding "inline" to the function definition). (<Usual advice about premature optimization goes here>)
Since your example appears to be using a C(++) syntax you may want to read up on inline functions which eliminate the overhead of calling a simple function. This keyword is only recommendation to the compiler though and it may not inline all functions that you mark, and may choose to inline unmarked functions.
In .NET the JIT will inline methods that it feels is appropiate, but you have no control over why or when it does this, though (as I understand it) debug builds will never inline since that would stop the source code matching the compiled application.
What language? If you mean C, I'd also use the inline qualifier. In C++, I have the option of inline, boost.lamda or and moving forward C++0x native support for lamdas.
There is nothing wrong with one line functions. As mentioned it is possible for the compiler to inline the functions which will remove any performance penalty.
Functions should also be preferred over macros as they are easier to debug, modify, read and less likely to have unintended side effects.
If it is used only once then the answer is less obvious. Moving it to a function can make the calling function simpler & clearer by moving some of the complexity into the new function.
If you use the code within that function 3 times or more, then I would recommend to put that in a function. Only for maintainability.
Sometimes it's not a bad idea to use the preprocessor:
#define addint(S, n) (*S)[n/CHAR_SIZE] |= (unsigned char) pow(2, (CHAR_SIZE - 1) - (n % CHAR_SIZE));
Granted, you don't get any kind of type checking, but in some cases this can be useful. Macros have their disadvantages and their advantages, and in a few cases their disadvantages can become advantages. I'm a fan of macros in appropriate places, but it's up to you to decide when is appropriate. In this case, I'm going to go out on a limb and say that, whatever you end up doing, that one line of code is quite a bit.
#define addint(S, n) do { \
unsigned char c = pow(2, (CHAR_SIZE -1) - (n % CHAR_SIZE)); \
(*S)[n/CHAR_SIZE] |= c \
} while(0)

What exactly is the danger of using magic debug values (such as 0xDEADBEEF) as literals?

It goes without saying that using hard-coded, hex literal pointers is a disaster:
int *i = 0xDEADBEEF;
// god knows if that location is available
However, what exactly is the danger in using hex literals as variable values?
int i = 0xDEADBEEF;
// what can go wrong?
If these values are indeed "dangerous" due to their use in various debugging scenarios, then this means that even if I do not use these literals, any program that during runtime happens to stumble upon one of these values might crash.
Anyone care to explain the real dangers of using hex literals?
Edit: just to clarify, I am not referring to the general use of constants in source code. I am specifically talking about debug-scenario issues that might come up to the use of hex values, with the specific example of 0xDEADBEEF.
There's no more danger in using a hex literal than any other kind of literal.
If your debugging session ends up executing data as code without you intending it to, you're in a world of pain anyway.
Of course, there's the normal "magic value" vs "well-named constant" code smell/cleanliness issue, but that's not really the sort of danger I think you're talking about.
With few exceptions, nothing is "constant".
We prefer to call them "slow variables" -- their value changes so slowly that we don't mind recompiling to change them.
However, we don't want to have many instances of 0x07 all through an application or a test script, where each instance has a different meaning.
We want to put a label on each constant that makes it totally unambiguous what it means.
if( x == 7 )
What does "7" mean in the above statement? Is it the same thing as
d = y / 7;
Is that the same meaning of "7"?
Test Cases are a slightly different problem. We don't need extensive, careful management of each instance of a numeric literal. Instead, we need documentation.
We can -- to an extent -- explain where "7" comes from by including a tiny bit of a hint in the code.
assertEquals( 7, someFunction(3,4), "Expected 7, see paragraph 7 of use case 7" );
A "constant" should be stated -- and named -- exactly once.
A "result" in a unit test isn't the same thing as a constant, and requires a little care in explaining where it came from.
A hex literal is no different than a decimal literal like 1. Any special significance of a value is due to the context of a particular program.
I believe the concern raised in the IP address formatting question earlier today was not related to the use of hex literals in general, but the specific use of 0xDEADBEEF. At least, that's the way I read it.
There is a concern with using 0xDEADBEEF in particular, though in my opinion it is a small one. The problem is that many debuggers and runtime systems have already co-opted this particular value as a marker value to indicate unallocated heap, bad pointers on the stack, etc.
I don't recall off the top of my head just which debugging and runtime systems use this particular value, but I have seen it used this way several times over the years. If you are debugging in one of these environments, the existence of the 0xDEADBEEF constant in your code will be indistinguishable from the values in unallocated RAM or whatever, so at best you will not have as useful RAM dumps, and at worst you will get warnings from the debugger.
Anyhow, that's what I think the original commenter meant when he told you it was bad for "use in various debugging scenarios."
There's no reason why you shouldn't assign 0xdeadbeef to a variable.
But woe betide the programmer who tries to assign decimal 3735928559, or octal 33653337357, or worst of all: binary 11011110101011011011111011101111.
Big Endian or Little Endian?
One danger is when constants are assigned to an array or structure with different sized members; the endian-ness of the compiler or machine (including JVM vs CLR) will affect the ordering of the bytes.
This issue is true of non-constant values, too, of course.
Here's an, admittedly contrived, example. What is the value of buffer[0] after the last line?
const int TEST[] = { 0x01BADA55, 0xDEADBEEF };
char buffer[BUFSZ];
memcpy( buffer, (void*)TEST, sizeof(TEST));
I don't see any problem with using it as a value. Its just a number after all.
There's no danger in using a hard-coded hex value for a pointer (like your first example) in the right context. In particular, when doing very low-level hardware development, this is the way you access memory-mapped registers. (Though it's best to give them names with a #define, for example.) But at the application level you shouldn't ever need to do an assignment like that.
I use CAFEBABE
I haven't seen it used by any debuggers before.
int *i = 0xDEADBEEF;
// god knows if that location is available
int i = 0xDEADBEEF;
// what can go wrong?
The danger that I see is the same in both cases: you've created a flag value that has no immediate context. There's nothing about i in either case that will let me know 100, 1000 or 10000 lines that there is a potentially critical flag value associated with it. What you've planted is a landmine bug that, if I don't remember to check for it in every possible use, I could be faced with a terrible debugging problem. Every use of i will now have to look like this:
if (i != 0xDEADBEEF) { // Curse the original designer to oblivion
// Actual useful work goes here
}
Repeat the above for all of the 7000 instances where you need to use i in your code.
Now, why is the above worse than this?
if (isIProperlyInitialized()) { // Which could just be a boolean
// Actual useful work goes here
}
At a minimum, I can spot several critical issues:
Spelling: I'm a terrible typist. How easily will you spot 0xDAEDBEEF in a code review? Or 0xDEADBEFF? On the other hand, I know that my compile will barf immediately on isIProperlyInitialised() (insert the obligatory s vs. z debate here).
Exposure of meaning. Rather than trying to hide your flags in the code, you've intentionally created a method that the rest of the code can see.
Opportunities for coupling. It's entirely possible that a pointer or reference is connected to a loosely defined cache. An initialization check could be overloaded to check first if the value is in cache, then to try to bring it back into cache and, if all that fails, return false.
In short, it's just as easy to write the code you really need as it is to create a mysterious magic value. The code-maintainer of the future (who quite likely will be you) will thank you.