How to interpret the variables when we drop an independent variable to avoid collinearity in Regression? - regression

I went through this and this. So whenever there are dummy variables which happen to be collinear, we need to drop one of them to avoid collinearity.
My confusion is how would I interpret the value of the other dummy variables with respect to the variable that I dropped?
For example, if I have 3 variables - Human, Bart and Pegasus, as binary variables. They represent how much repetition of words is there in each architecture. I dropped the human column and the values for the coefficient for Bart and Pegasus were :
Bart is -83.0908 and Pegasus is 76.5434. How do I interpret these in terms of the Human architecture? Please help!

Related

Using binary outcome variable and predictors with geeglm and gee in R?

When I use glm to perform logistic regression, first I read my binary outcome and predictor variables from a CSV file and convert them to factors (they are read in as ints). But if I do that before using gee or geeglm for clustering, geeglm gives me an error message, and although gee doesn't give me an error, the documentation says that you must use continuous predictors. It seems like I get sensical results if I just read in my binary variables and leave them as ints and do not convert them to factors. Is this an ok thing to do, and are there any concerns I need to be aware of? I specify binomial, and they are both for generalized linear models, so I'm not sure what's so wrong with passing factors.
Relatedly, in the scenario I'm describing, do I need to specify scale.fix=TRUE?
Thanks in advance!

What exactly is a datatype?

I understand what a datatype is (intuitively). But I need the formal definition. I don't understand if it is a set or it's the names 'int' 'float' etc. The formal definition found on wikipedia is confusing.
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
Can anyone help me with that?
Yep. What that's saying is that a data type has three pieces:
The various possible values. So, for example, an eight bit signed integer might have -127..128. This of that as a set of values V.
The operations: so an 8-bit signed integer might have +, -, * (multiply), and / (divide). The full definition would define those as functions from V into V, or possible as a function from V into float for division.
The way it's stored -- I sort of gave it away when I said "eight bit signed integer". The other detail is that I'm assuming a specific representation by the way I showed the range of values.
You might, if you're into object oriented programming, notice that this is very much like the definition of a class, which is defined by the storage used by each object, adn the methods of the class. Providing those parts for some arbitrary thing, but not inheritance rules, gives you what's called an abstract data type.
Update
#Appy, there's some room for differences in the formalities. I was a little subtle because it was late and I was suddenly uncertain if I'd assumed one's complement or two's complement -- of course it's two's complement. So interpretation is included in my description. Abstractly, though, you'd say it is a algebraic structure T=(V,O) where V is a set of values, O a set of functions from V into some arbitrary type -- remember '==' for example will be a function eq:V × V → {0,1} so you can't expect every operation to be into V.
I can define it as a classification of a particular type of information. It is easy for humans to distinguish between different types of data. We can usually tell at a glance whether a number is a percentage, a time, or an amount of money. We do this through special symbols %, :, and $.
Basically it's the concept that I am sure you grock. For computers however a data type is defined and has various associated attributes, like size, like a definition keywork (sometimes), the values it can take (numbers or characters for example) and operations that can be done on it like add subtract for numbers and append on string or compare on a character, etc. These differ from language to language and even from environment to env. (16 - 32 bit ints/ 32 - 64 envs./ etc).
If there is anything I am missing or needs refining please ask as this is fairly open ended.

Purity vs Referential transparency

The terms do appear to be defined differently, but I've always thought of one implying the other; I can't think of any case when an expression is referentially transparent but not pure, or vice-versa.
Wikipedia maintains separate articles for these concepts and says:
From Referential transparency:
If all functions involved in the
expression are pure functions, then
the expression is referentially
transparent. Also, some impure
functions can be included in the
expression if their values are
discarded and their side effects are
insignificant.
From Pure expressions:
Pure functions are required to
construct pure expressions. [...] Pure
expressions are often referred to as
being referentially transparent.
I find these statements confusing. If the side effects from a so-called "impure function" are insignificant enough to allow not performing them (i.e. replace a call to such a function with its value) without materially changing the program, it's the same as if it were pure in the first place, isn't it?
Is there a simpler way to understand the differences between a pure expression and a referentially transparent one, if any? If there is a difference, an example expression that clearly demonstrates it would be appreciated.
If I gather in one place any three theorists of my acquaintance, at least two of them disagree on the meaning of the term "referential transparency." And when I was a young student, a mentor of mine gave me a paper explaining that even if you consider only the professional literature, the phrase "referentially transparent" is used to mean at least three different things. (Unfortunately that paper is somewhere in a box of reprints that have yet to be scanned. I searched Google Scholar for it but I had no success.)
I cannot inform you, but I can advise you to give up: Because even the tiny cadre of pointy-headed language theorists can't agree on what it means, the term "referentially transparent" is not useful. So don't use it.
P.S. On any topic to do with the semantics of programming languages, Wikipedia is unreliable. I have given up trying to fix it; the Wikipedian process seems to regard change and popular voting over stability and accuracy.
All pure functions are necessarily referentially transparent. Since, by definition, they cannot access anything other than what they are passed, their result must be fully determined by their arguments.
However, it is possible to have referentially transparent functions which are not pure. I can write a function which is given an int i, then generates a random number r, subtracts r from itself and places it in s, then returns i - s. Clearly this function is impure, because it is generating random numbers. However, it is referentially transparent. In this case, the example is silly and contrived. However, in, e.g., Haskell, the id function is of type a - > a whereas my stupidId function would be of type a -> IO a indicating that it makes use of side effects. When a programmer can guarantee through means of an external proof that their function is actually referentially transparent, then they can use unsafePerformIO to strip the IO back away from the type.
I'm somewhat unsure of the answer I give here, but surely somebody will point us in some direction. :-)
"Purity" is generally considered to mean "lack of side-effects". An expression is said to be pure if its evaluation lacks side-effects. What's a side-effect then? In a purely functional language, side-effect is anything that doesn't go by the simple beta-rule (the rule that to evaluate function application is the same as to substitute actual parameter for all free occurrences of the formal parameter).
For example, in a functional language with linear (or uniqueness, this distinction shouldn't bother at this moment) types some (controlled) mutation is allowed.
So I guess we have sorted out what "purity" and "side-effects" might be.
Referential transparency (according to the Wikipedia article you cited) means that variable can be replaced by the expression it denotes (abbreviates, stands for) without changing the meaning of the program at hand (btw, this is also a hard question to tackle, and I won't attempt to do so here). So, "purity" and "referential transparency" are indeed different things: "purity" is a property of some expression roughly means "doesn't produce side-effects when executed" whereas "referential transparency" is a property relating variable and expression that it stands for and means "variable can be replaced with what it denotes".
Hopefully this helps.
These slides from one ACCU2015 talk have a great summary on the topic of referential transparency.
From one of the slides:
A language is referentially transparent if (a)
every subexpression can be replaced by any other
that’s equal to it in value and (b) all occurrences of
an expression within a given context yield the
same value.
You can have, for instance, a function that logs its computation to the program standard output (so, it won't be a pure function), but you can replace calls for this function by a similar function that doesn't log its computation. Therefore, this function have the referential transparency property. But... the above definition is about languages, not expressions, as the slides emphasize.
[...] it's the same as if it were pure in the first place, isn't it?
From the definitions we have, no, it is not.
Is there a simpler way to understand the differences between a pure expression and a referentially transparent one, if any?
Try the slides I mentioned above.
The nice thing about standards is that there are so many of them to choose
from.
Andrew S. Tanenbaum.
...along with definitions of referential transparency:
from page 176 of Functional programming with Miranda by Ian Holyer:
8.1 Values and Behaviours
The most important property of the semantics of a pure functional language is that the declarative and operational views of the language coincide exactly, in the following way:
Every expression denotes a value, and there are valuescorresponding to all possible program behaviours. Thebehaviour produced by an expression in any context is completely determined by its value, and vice versa.
This principle, which is usually rather opaquely called referential transparency, can also be pictured in the following way:
and from Nondeterminism with Referential Transparency in Functional Programming Languages by F. Warren Burton:
[...] the property that an expression always has the same value in the same environment [...]
...for various other definitions, see Referential Transparency, Definiteness and Unfoldability by Harald Søndergaard and Peter Sestoft.
Instead, we'll begin with the concept of "purity". For the three of you who didn't know it already, the computer or device you're reading this on is a solid-state Turing machine, a model of computing intrinsically connected with effects. So every program, functional or otherwise, needs to use those effects To Get Things DoneTM.
What does this mean for purity? At the assembly-language level, which is the domain of the CPU, all programs are impure. If you're writing a program in assembly language, you're the one who is micro-managing the interplay between all those effects - and it's really tedious!
Most of the time, you're just instructing the CPU to move data around in the computer's memory, which only changes the contents of individual memory locations - nothing to see there! It's only when your instructions direct the CPU to e.g. write to video memory, that you observe a visible change (text appearing on the screen).
For our purposes here, we'll split effects into two coarse categories:
those involving I/O devices like screens, speakers, printers, VR-headsets, keyboards, mice, etc; commonly known as observable effects.
and the rest, which only ever change the contents of memory.
In this situation, purity just means the absence of those observable effects, the ones which cause a visible change to the environment of the running program, maybe even its host computer. It is definitely not the absence of all effects, otherwise we would have to replace our solid-state Turing machines!
Now for the question of 42 life, the Universe and everything what exactly is meant by the term "referential transparency" - instead of herding cats trying to bring theorists into agreement, let's just try to find the original meaning given to the term. Fortunately for us, the term frequently appears in the context of I/O in Haskell - we only need a relevant article...here's one: from the first page of Owen Stephen's Approaches to Functional I/O:
Referential transparency refers to the ability to replace a sub-expression with one of equal value, without changing the value of the outer expression. Originating from Quine the term was introduced to Computer Science by Strachey.
Following the references:
From page 9 of 39 in Christopher Strachey's Fundamental Concepts in Programming Languages:
One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.
From page 163 of 314 in Willard Van Ormond Quine's Word and Object:
[...] Quotation, which thus interrupts the referential force of a term, may be said to fail of referential transparency2. [...] I call a mode of confinement Φ referentially transparent if, whenever an occurrence of a singular term t is purely referential in a term or sentence ψ(t), it is purely referential also in the containing term or sentence Φ(ψ(t)).
with the footnote:
2 The term is from Whitehead and Russell, 2d ed., vol. 1, p. 665.
Following that reference:
From page 709 of 719 in Principa Mathematica by Alfred North Whitehead and Bertrand Russell:
When an assertion occurs, it is made by means of a particular fact, which is an instance of the proposition asserted. But this particular fact is, so to speak, "transparent"; nothing is said about it, bit by means of it something is said about something else. It is the "transparent" quality which belongs to propositions as they occur in truth-functions.
Let's try to bring all that together:
Whitehead and Russell introduce the term "transparent";
Quine then defines the qualified term "referential transparency";
Strachey then adapts Quine's definition in defining the basics of programming languages.
So it's a choice between Quine's original or Strachey's adapted definition. You can try translating Quine's definition for yourself if you like - everyone who's ever contested the definition of "purely functional" might even enjoy the chance to debate something different like what "mode of containment" and "purely referential" really means...have fun! The rest of us will just accept that Strachey's definition is a little vague ("In essence [...]") and continue on:
One useful property of expressions is referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression,
the only thing we need to know about the sub-expression is its value. Any other features of the sub-expression, such as its internal structure, the number and nature of
its components, the order in which they are evaluated or the colour of the ink in which they are written, are irrelevant to the value of the main expression.
(emphasis by me.)
Regarding that description ("that if we wish to find the value of [...]"), a similar, but more concise statement is given by Peter Landin in The Next 700 Programming Languages:
the thing an expression denotes, i.e., its "value", depends only on the values of its sub-expressions, not on other properties of them.
Thus:
One useful property of expressions is referential transparency. In essence this means the thing an expression denotes, i.e., its "value", depends only on the values of its sub-expressions, not on other properties of them.
Strachey provides some examples:
(page 12 of 39)
We tend to assume automatically that the symbol x in an expression such as 3x2 + 2x + 17 stands for the same thing (or has the same value) on each occasion it occurs. This is the most important consequence of referential transparency and it is only in virtue of this property that we can use the where-clauses or λ-expressions described in the last section.
(and on page 16)
When the function is used (or called or applied) we write f[ε] where ε can be an expression. If we are using a referentially transparent language all we require to know about the expression ε in order to evaluate f[ε] is its value.
So referential transparency, by Strachey's original definition, implies purity - in the absence of an order of evaluation, observable and other effects are practically useless...
I'll quote John Mitchell Concept in programming language. He defines pure functional language has to pass declarative language test which is free from side-effects or lack of side effects.
"Within the scope of specific deceleration of x1,...,xn , all occurrence of an expression e containing only variables x1,...,xn have the same value."
In linguistics a name or noun phrase is considered referentially transparent if it may be replaced with the another noun phrase with same referent without changing the meaning of the sentence it contains.
Which in 1st case holds but in 2nd case it gets too weird.
Case 1:
"I saw Walter get into his new car."
And if Walter own a Centro then we could replace that in the given sentence as:
"I saw Walter get into his Centro"
Contrary to first :
Case #2 : He was called William Rufus because of his read beard.
Rufus means somewhat red and reference was to William IV of England.
"He was called William IV because of his read beard." looks too awkward.
Traditional way to say is, a language is referentially transparent if we may replace one expression with another of equal value anywhere in the program without changing the meaning of the program.
So, referential transparency is a property of pure functional language.
And if your program is free from side effects then this property will hold.
So give it up is awesome advice but get it on might also look good in this context.
Pure functions are those that return the same value on every call, and do not have side effects.
Referential transparency means that you can replace a bound variable with its value and still receive the same output.
Both pure and referentially transparent:
def f1(x):
t1 = 3 * x
t2 = 6
return t1 + t2
Why is this pure?
Because it is a function of only the input x and has no side-effects.
Why is this referentially transparent?
You could replace t1 and t2 in f1 with their respective right hand sides in the return statement, as follows
def f2(x):
return 3 * x + 6
and f2 will still always return the same result as f1 in every case.
Pure, but not referentially transparent:
Let's modify f1 as follows:
def f3(x):
t1 = 3 * x
t2 = 6
x = 10
return t1 + t2
Let us try the same trick again by replacing t1 and t2 with their right hand sides, and see if it is an equivalent definition of f3.
def f4(x):
x = 10
return 3 * x + 6
We can easily observe that f3 and f4 are not equivalent on replacing variables with their right hand sides / values. f3(1) would return 9 and f4(1) would return 36.
Referentially transparent, but not pure:
Simply modifying f1 to receive a non-local value of x, as follows:
def f5:
global x
t1 = 3 * x
t2 = 6
return t1 + t2
Performing the same replacement exercise from before shows that f5 is still referentially transparent. However, it is not pure because it is not a function of only the arguments passed to it.
Observing carefully, the reason we lose referential transparency moving from f3 to f4 is that x is modified. In the general case, making a variable final (or those familiar with Scala, using vals instead of vars) and using immutable objects can help keep a function referentially transparent. This makes them more like variables in the algebraic or mathematical sense, thus lending themselves better to formal verification.

What is a 'value' in the context of programming?

Can you suggest a precise definition for a 'value' within the context of programming without reference to specific encoding techniques or particular languages or architectures?
[Previous question text, for discussion reference: "What is value in programming? How to define this word precisely?"]
I just happened to be glancing through Pierce's "Types and Programming Languages" - he slips a reasonably precise definition of "value" in a programming context into the text:
[...] defines a subset of terms, called values, that are possible final results of evaluation
This seems like a rather tidy definition - i.e., we take the set of all possible terms, and the ones that can possibly be left over after all evaluation has taken place are values.
Based on the ongoing comments about "bits" being an unacceptable definition, I think this one is a little better (although possibly still flawed):
A value is anything representable on a piece of possibly-infinite Turing machine tape.
Edit: I'm refining this some more.
A value is a member of the set of possible interpretations of any possibly-infinite sequence of symbols.
That is equivalent to the earlier definition based on Turing machine tape, but it actually generalises better.
Here, I'll take a shot: A value is a piece of stored information (in the information-theoretical sense) that can be manipulated by the computer.
(I won't say that a value has meaning; a random number in a register may have no meaning, but it's still a value.)
In short, a value is some assigned meaning to a variable (the object containing the value)
For example type=boolean; name=help; variable=a storage location; value=what is stored in that location;
Further break down:
X = 2; where X is a variable while 2 is the value stored in X.
Have you checked the article in wikipedia?
In computer science, a value is a sequence of bits that is interpreted according to some data type. It is possible for the same sequence of bits to have different values, depending on the type used to interpret its meaning. For instance, the value could be an integer or floating point value, or a string.
Read the Wiki
Value = Value is what we call the "contents" that was stored in the variable
Variables = containers for storing data values
Example: Think of a folder named "Movies"(Variables) and inside of it are it contents which are namely; Pirates of the Carribean, Fantastic Beast, and Lala land, (this in turn is what we now call it's Values )

Creating a logic gate simulator

I need to make an application for creating logic circuits and seeing the results. This is primarily for use in A-Level (UK, 16-18 year olds generally) computing courses.
Ive never made any applications like this, so am not sure on the best design for storing the circuit and evaluating the results (at a resomable speed, say 100Hz on a 1.6Ghz single core computer).
Rather than have the circuit built from the basic gates (and, or, nand, etc) I want to allow these gates to be used to make "chips" which can then be used within other circuits (eg you might want to make a 8bit register chip, or a 16bit adder).
The problem is that the number of gates increases massively with such circuits, such that if the simulation worked on each individual gate it would have 1000's of gates to simulate, so I need to simplify these components that can be placed in a circuit so they can be simulated quickly.
I thought about generating a truth table for each component, then simulation could use a lookup table to find the outputs for a given input. The problem occurred to me though that the size of such tables increase massively with inputs. If a chip had 32 inputs, then the truth table needs 2^32 rows. This uses a massive amount of memory in many cases more than there is to use so isn't practical for non-trivial components, it also wont work with chips that can store their state (eg registers) since they cant be represented as a simply table of inputs and outputs.
I know I could just hardcode things like register chips, however since this is for educational purposes I want it so that people can make their own components as well as view and edit the implementations for standard ones. I considered allowing such components to be created and edited using code (eg dlls or a scripting language), so that an adder for example could be represented as "output = inputA + inputB" however that assumes that the students have done enough programming in the given language to be able to understand and write such plugins to mimic the results of their circuit which is likly to not be the case...
Is there some other way to take a boolean logic circuit and simplify it automatically so that the simulation can determine the outputs of a component quickly?
As for storing the components I was thinking of storing some kind of tree structure, such that each component is evaluated once all components that link to its inputs are evaluated.
eg consider: A.B + C
The simulator would first evaluate the AND gate, and then evaluate the OR gate using the output of the AND gate and C.
However it just occurred to me that in cases where the outputs link back round to the inputs, will cause a deadlock because there inputs will never all be evaluated...How can I overcome this, since the program can only evaluate one gate at a time?
Have you looked at Richard Bowles's simulator?
You're not the first person to want to build their own circuit simulator ;-).
My suggestion is to settle on a minimal set of primitives. When I began mine (which I plan to resume one of these days...) I had two primitives:
Source: zero inputs, one output that's always 1.
Transistor: two inputs A and B, one output that's A and not B.
Obviously I'm misusing the terminology a bit, not to mention neglecting the niceties of electronics. On the second point I recommend abstracting to wires that carry 1s and 0s like I did. I had a lot of fun drawing diagrams of gates and adders from these. When you can assemble them into circuits and draw a box round the set (with inputs and outputs) you can start building bigger things like multipliers.
If you want anything with loops you need to incorporate some kind of delay -- so each component needs to store the state of its outputs. On every cycle you update all the new states from the current states of the upstream components.
Edit Regarding your concerns on scalability, how about defaulting to the first principles method of simulating each component in terms of its state and upstream neighbours, but provide ways of optimising subcircuits:
If you have a subcircuit S with inputs A[m] with m < 8 (say, giving a maximum of 256 rows) and outputs B[n] and no loops, generate the truth table for S and use that. This could be done automatically for identified subcircuits (and reused if the subcircuit appears more than once) or by choice.
If you have a subcircuit with loops, you may still be able to generate a truth table. There are fixed-point finding methods which can help here.
If your subcircuit has delays (and they are significant to the enclosing circuit) the truth table can incorporate state columns. E.g. if the subcircuit has input A, inner state B, and output C, where C <- A and B, B <- A, the truth table could be:
A B | B C
0 0 | 0 0
0 1 | 0 0
1 0 | 1 0
1 1 | 1 1
If you have a subcircuit that the user asserts implements a particular known pattern such as "adder", provide an option for using a hard-coded implementation for updating that subcircuit instead of by simulating its inner parts.
When I made a circuit emulator (sadly, also incomplete and also unreleased), here's how I handled loops:
Each circuit element stores its boolean value
When an element "E0" changes its value, it notifies (via the observer pattern) all who depend on it
Each observing element evaluates its new value and does likewise
When the E0 change occurs, a level-1 list is kept of all elements affected. If an element already appears on this list, it gets remembered in a new level-2 list but doesn't continue to notify its observers. When the sequence which E0 began has stopped notifying new elements, the next queue level is handled. Ie: the sequence is followed and completed for the first element added to level-2, then the next added to level-2, etc. until all of level-x is complete, then you move to level-(x+1)
This is in no way complete. If you ever have multiple oscillators doing infinite loops, then no matter what order you take them in, one could prevent the other from ever getting its turn. My next goal was to alleviate this by limiting steps with clock-based sync'ing instead of cascading combinatorials, but I never got this far in my project.
You might want to take a look at the From Nand To Tetris in 12 steps course software. There is a video talking about it on youtube.
The course page is at: http://www1.idc.ac.il/tecs/
If you can disallow loops (outputs linking back to inputs), then you can significantly simplify the problem. In that case, for every input there will be exactly one definite output. Cycles however can make the output undecideable (or rather, constantly changing).
Evaluating a circuit without loops should be easy - just use the BFS algorithm with "junctions" (connections between logic gates) as the items in the list. Start off with all the inputs to all the gates in an "undefined" state. As soon as a gate has all inputs "defined" (either 1 or 0), calculate its output and add its output junctions to the BFS list. This way you only have to evaluate each gate and each junction once.
If there are loops, the same algorithm can be used, but the circuit can be built in such a way that it never comes to a "rest" and some junctions are always changing between 1 and 0.
OOps, actually, this algorithm can't be used in this case because the looped gates (and gates depending on them) would forever stay as "undefined".
You could introduce them to the concept of Karnaugh maps, which would help them simplify truth values for themselves.
You could hard code all the common ones. Then allow them to build their own out of the hard coded ones (which would include low level gates), which would be evaluated by evaluating each sub-component. Finally, if one of their "chips" has less than X inputs/outputs, you could "optimize" it into a lookup table. Maybe detect how common it is and only do this for the most used Y chips? This way you have a good speed/space tradeoff.
You could always JIT compile the circuits...
As I haven't really thought about it, I'm not really sure what approach I'd take.. but it would possibly be a hybrid method and I'd definitely hard code popular "chips" in too.
When I was playing around making a "digital circuit" simulation environment, I had each defined circuit (a basic gate, a mux, a demux and a couple of other primitives) associated with a transfer function (that is, a function that computes all outputs, based on the present inputs), an "agenda" structure (basically a linked list of "when to activate a specific transfer function), virtual wires and a global clock.
I arbitrarily set the wires to hard-modify the inputs whenever the output changed and the act of changing an input on any circuit to schedule a transfer function to be called after the gate delay. With this at hand, I could accommodate both clocked and unclocked circuit elements (a clocked element is set to have its transfer function run at "next clock transition, plus gate delay", any unclocked element just depends on the gate delay).
Never really got around to build a GUI for it, so I've never released the code.