What is really happening when using variables? - function

I have a really basic question about something that I've never paid much attention to until now:
I noticed that when creating a function (in JS or Python) that uses a variable from the outer scope, the function is not defined using the value of the variable but rather the variable itself. So if I change the value of the variable the function will use the new value.
This is what I mean
let a = 10;
function printA(){
console.log(a);
}
printA(); // => 10
a = 20;
printA(); // => 20
a = 10
def printA():
print(a)
printA() # => 10
a = 20
printA() # => 20
I thought this was only going to work of objects because of the way you can modify an object inside a function but not primitive variables because there's no way to change their value without reasigning them. I guess this is a different story.
What I'm trying to understand is: when typing a variable name is typing its memory address what I'm really doing? Does this happen with all languages?

when I create a function like printA() that uses a variable that is not an argument, is the variable bound forever to the function by its address?
The variable a is "captured" by the function. The specifics of how that happens are usually implementation details and may result in the compiler/interpreter producing code that doesn't much resemble the original.
For instance, in C# (I know, not one of the languages you mentioned, but it's the one I'm most familiar with), the compiler will create a separate hidden class which actually contains fields for the variables that are captured by a lambda or nested function. It then accesses these fields rather than plain variables.
by its address
Variables don't typically have an address. For instance, every time you call a method, it will typically have an "activation record" of some kind created, that will typically contain its variables. But note that these records are not at some fixed location, which is how you can have parallel execution of methods, recursion, etc, without interference. (Some older BASICs did have fixed activation records, which is why they didn't allow for recursion). These activation records may typically be placed on some kind of stack.
But as I say, for captured variables, the compiler will typically need to do even more so that those variables aren't just stored in an activation record, and so that their lifetime is no longer tied to a single call.

Related

When to use scoped or public variables?

I am having some confusion between when to use a scoped variable. When you declare it in a function like this.
int main(){
int x;
}
And when to use a public variable like this.
int x;
int main(){
}
The scoped variable being the one only available in the function it is declared in and the public variable being the one available to the entire file. Could you help me understand when to use this?
Ask these questions to you,
Do you intent to use x in main function only?
Are you going to pass x as parameter to mains subfunctions?
Then it should be scoped.
Createors of modular programming will be happy to see this
Does your app consist of many files?
Is x is something (like global state) which your app needs in all those files?
Then x should be kept global (in header file , as best practice) so it can be used as extern in other files.
If you are just starting to write your app:
If you start x as a scope varible and end up passing it in every function across all files then you should change it as global scope
If your app is already developed and x is being added as a new functionality:
Then you should be already knowing x represents global state or not.
Here's a simple rule of thumb: declare a variable in the narrowest possible scope in which you'll be using it.
As for an explanation of why: in C and C++ (and other languages), variables essentially "cease to exist" when they go out of scope. If you declare a variable outside of a function, that variable gets processed and stored in memory before the function is called. Each time you call the function and use that variable, the same bit of memory is accessed, and as a result the variable maintains its value between calls.
Meanwhile, if you declare a variable inside a function (for example; in C/C++, I think anything with curly brackets in it defines a narrower scope), it gets allocated and stored in memory only when it comes into scope. When that scope ends (e.g. with a return statement at the end of a method), all the memory that was in that scope is released.
This all ties into the stack, which is one of the two main ways that C and C++ handle dynamic memory. Here's a serviceable overview of how C programs lay out memory; suffice it to say that the reasons for the rule of thumb above are that
you don't want to consume any more memory than you need at any given time. In other words, you don't want a variable sitting around that you're never going to use.
It's much easier to debug a problem when the variable has a limited scope, and you know everything that could be affecting it, than when it's in the global scope and everything could be affecting it

Naming parameters the same as the variable passed to them?

Are there any rules (or will I run into any problems) if I name the parameters of a function the same as the variable I will pass into them?
For example in Python:
def foo(param):
pass
param = 2
foo(param)
In the fairly limited programming I've done, I have not ran into any problems doing this. Will I get problems in certain languages? Is this okay to do, or is it a practice to be avoided?
The "problem" in this particular case is that the function parameter name will shadow the outer param variable; i.e. you cannot (implicitly) refer to the global param anymore because inside your function param is defined as a local variable.
But, this is really as it should be. Your function should only worry about the parameters it declares locally, not about implicit global variables. Conversely, a caller of a function should not have to worry about anything that goes on inside a function. Naming a variable the same as a parameter to a function is of no consequence to the caller, and should be of no consequence to the function itself.
So, no, there's absolutely no issue here.

When is passing a subprogram as a parameter necessary

I've been reading a Concepts of Programming Languages by Robert W. Sebesta and in chapter 9 there is a brief section on passing a SubProgram to a function as a parameter. The section on this is extremely brief, about 1.5 pages, and the only explanation to its application is:
When a subprogram must sample some mathematical function. Such as a Subprogram that does numerical integration by estimating the area under a graph of a function by sampling the function at a number of different points. Such a Subprogram should be usable everywhere.
This is completely off from anything I have ever learned. If I were to approach this problem in my own way I would create a function object and create a function that accomplishes the above and accepts function objects.
I have no clue why this is a design issue for languages because I have no idea where I would ever use this. A quick search hasn't made this any clearer for me.
Apparently you can accomplish this in C and C++ by utilizing pointers. Languages that allow nested Subprograms such as JavaScript allow you do do this in 3 separate ways:
function sub1() {
var x;
function sub2() {
alert( x ); //Creates a dialog box with the value of x
};
function sub3() {
var x;
x = 3;
sub4( sub2 ); //*shallow binding* the environment of the
//call statement that enacts the passed
//subprogram
};
function sub4( subx ) {
var x;
x = 4;
subx();
};
x=1;
sub3();
};
I'd appreciate any insight offered.
Being able to pass "methods" is very useful for a variety of reasons. Among them:
Code which is performing a complicated operation might wish to provide a means of either notifying a user of its progress or allowing the user to cancel it. Having the code for the complicated operation has to do those actions itself will both add complexity to it and also cause ugliness if it's invoked from code which uses a different style of progress bar or "Cancel" button. By contrast, having the caller supply an UpdateStatusAndCheckCancel() method means that the caller can supply a method which will update whatever style of progress bar and cancellation method the caller wants to use.
Being able to store methods within a table can greatly simplify code that needs to export objects to a file and later import them again. Rather than needing to have code say
if (ObjectType == "Square")
AddObject(new Square(ObjectParams));
else if (ObjectType == "Circle")
AddObject(new Circle(ObjectParams));`
etc. for every kind of object
code can say something like
if (ObjectCreators.TryGetValue(ObjectType, out factory))
AddObject(factory(ObjectParams));
to handle all kinds of object whose creation methods have been added to ObjectCreators.
Sometimes it's desirable to be able to handle events that may occur at some unknown time in the future; the author of code which knows when those events occur might have no clue about what things are supposed to happen then. Allowing the person who wants the action to happen to give a method to the code which will know when it happens allows for that code to perform the action at the right time without having to know what it should do.
The first situation represents a special case of callback where the function which is given the method is expected to only use it before it returns. The second situation is an example of what's sometimes referred to as a "factory pattern" or "dependency injection" [though those terms are useful in some broader contexts as well]. The third case is commonly handled using constructs which frameworks refer to as events, or else with an "observer" pattern [the observer asks the observable object to notify it when something happens].

Is it good practice to name variables differently when defining more than one function?

For example, in this simple function where fun1 takes as input two numbers, adds them together and passes them to function 2 for printing the output. var1_in is local to each function, so is it OK to use the name var1_in in both functions, or is it better practice to call them different things?
fun1 <- function (var1_in, var2_in) {
var3 = var1_in + var2_in
fun2(var3)
}
fun2 <- function (var1_in) {
var4 = var1_in
print(var4)
}
As long as the functions are short enough to easily understand, then identifying the scope of local variables and parameters will be easy as well. But there isn't a hard and fast rule for this. What's important is that the code is easy to understand and that the names of variables are relevant and meaningful regardless if this means name duplication. Modern IDE's will also help here by highlighting the instances of such variables making it easy to see their declaration and various usage points. Point being, I would focus more on quality and meaningful naming rather than duplication of variable names.
EDIT - Of course, one situation to avoid would be naming a local variable or parameter the same as a global variable. This can confuse things greatly and lead to many a subtle bug.

How do you return two values from a single method?

When your in a situation where you need to return two things in a single method, what is the best approach?
I understand the philosophy that a method should do one thing only, but say you have a method that runs a database select and you need to pull two columns. I'm assuming you only want to traverse through the database result set once, but you want to return two columns worth of data.
The options I have come up with:
Use global variables to hold returns. I personally try and avoid globals where I can.
Pass in two empty variables as parameters then assign the variables inside the method, which now is a void. I don't like the idea of methods that have a side effects.
Return a collection that contains two variables. This can lead to confusing code.
Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
This is not entirely language-agnostic: in Lisp, you can actually return any number of values from a function, including (but not limited to) none, one, two, ...
(defun returns-two-values ()
(values 1 2))
The same thing holds for Scheme and Dylan. In Python, I would actually use a tuple containing 2 values like
def returns_two_values():
return (1, 2)
As others have pointed out, you can return multiple values using the out parameters in C#. In C++, you would use references.
void
returns_two_values(int& v1, int& v2)
{
v1 = 1; v2 = 2;
}
In C, your method would take pointers to locations, where your function should store the result values.
void
returns_two_values(int* v1, int* v2)
{
*v1 = 1; *v2 = 2;
}
For Java, I usually use either a dedicated class, or a pretty generic little helper (currently, there are two in my private "commons" library: Pair<F,S> and Triple<F,S,T>, both nothing more than simple immutable containers for 2 resp. 3 values)
I would create data transfer objects. If it is a group of information (first and last name) I would make a Name class and return that. #4 is the way to go. It seems like more work up front (which it is), but makes it up in clarity later.
If it is a list of records (rows in a database) I would return a Collection of some sort.
I would never use globals unless the app is trivial.
Not my own thoughts (Uncle Bob's):
If there's cohesion between those two variables - I've heard him say, you're missing a class where those two are fields. (He said the same thing about functions with long parameter lists.)
On the other hand, if there is no cohesion, then the function does more than one thing.
I think the most preferred approach is to build a container (may it be a class or a struct - if you don't want to create a separate class for this, struct is the way to go) that will hold all the parameters to be returned.
In the C/C++ world it would actually be quite common to pass two variables by reference (an example, your no. 2).
I think it all depends on the scenario.
Thinking from a C# mentality:
1: I would avoid globals as a solution to this problem, as it is accepted as bad practice.
4: If the two return values are uniquely tied together in some way or form that it could exist as its own object, then you can return a single object that holds the two values. If this object is only being designed and used for this method's return type, then it likely isn't the best solution.
3: A collection is a great option if the returned values are the same type and can be thought of as a collection. However, if the specific example needs 2 items, and each item is it's 'own' thing -> maybe one represents the beginning of something, and the other represents the end, and the returned items are not being used interchangably, then this may not be the best option.
2: I like this option the best, if 4, and 3 do not make sense for your scenario. As stated in 3, if you wanted to get two objects that represent the beginning and end items of something. Then I would use parameters by reference (or out parameters, again, depending on how it's all being used). This way your parameters can explicitly define their purpose: MethodCall(ref object StartObject, ref object EndObject)
Personally I try to use languages that allow functions to return something more than a simple integer value.
First, you should distinguish what you want: an arbitrary-length return or fixed-length return.
If you want your method to return an arbitrary number of arguments, you should stick to collection returns. Because the collections--whatever your language is--are specifically tied to fulfill such a task.
But sometimes you just need to return two values. How does returning two values--when you're sure it's always two values--differ from returning one value? No way it differs, I say! And modern languages, including perl, ruby, C++, python, ocaml etc allow function to return tuples, either built-in or as a third-party syntactic sugar (yes, I'm talking about boost::tuple). It looks like that:
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
Specifying an "out parameter", in my opinion, is overused due to the limitations of older languages and paradigms learned those days. But there still are many cases when it's usable (if your method needs to modify an object passed as parameter, that object being not the class that contains a method).
The conclusion is that there's no generic answer--each situation has its own solution. But one common thing there is: it's not violation of any paradigm that function returns several items. That's a language limitation later somehow transferred to human mind.
Python (like Lisp) also allows you to return any number of
values from a function, including (but not limited to)
none, one, two
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
Some languages make doing #3 native and easy. Example: Perl. "return ($a, $b);". Ditto Lisp.
Barring that, check if your language has a collection suited to the task, ala pair/tuple in C++
Barring that, create a pair/tuple class and/or collection and re-use it, especially if your language supports templating.
If your function has return value(s), it's presumably returning it/them for assignment to either a variable or an implied variable (to perform operations on, for instance.) Anything you can usefully express as a variable (or a testable value) should be fair game, and should dictate what you return.
Your example mentions a row or a set of rows from a SQL query. Then you reasonably should be ready to deal with those as objects or arrays, which suggests an appropriate answer to your question.
When your in a situation where you
need to return two things in a single
method, what is the best approach?
It depends on WHY you are returning two things.
Basically, as everyone here seems to agree, #2 and #4 are the two best answers...
I understand the philosophy that a
method should do one thing only, but
say you have a method that runs a
database select and you need to pull
two columns. I'm assuming you only
want to traverse through the database
result set once, but you want to
return two columns worth of data.
If the two pieces of data from the database are related, such as a customer's First Name and Last Name, I would indeed still consider this to be doing "one thing."
On the other hand, suppose you have come up with a strange SELECT statement that returns your company's gross sales total for a given date, and also reads the name of the customer that placed the first sale for today's date. Here you're doing two unrelated things!
If it's really true that performance of this strange SELECT statement is much better than doing two SELECT statements for the two different pieces of data, and both pieces of data really are needed on a frequent basis (so that the entire application would be slower if you didn't do it that way), then using this strange SELECT might be a good idea - but you better be prepared to demonstrate why your way really makes a difference in perceived response time.
The options I have come up with:
1 Use global variables to hold returns. I personally try and avoid
globals where I can.
There are some situations where creating a global is the right thing to do. But "returning two things from a function" is not one of those situations. Doing it for this purpose is just a Bad Idea.
2 Pass in two empty variables as parameters then assign the variables
inside the method, which now is a
void.
Yes, that's usually the best idea. This is exactly why "by reference" (or "output", depending on which language you're using) parameters exist.
I don't like the idea of methods that have a side effects.
Good theory, but you can take it too far. What would be the point of calling SaveCustomer() if that method didn't have a side-effect of saving the customer's data?
By Reference parameters are understood to be parameters that contain returned data.
3 Return a collection that contains two variables. This can lead to confusing code.
True. It wouldn't make sense, for instance, to return an array where element 0 was the first name and element 1 was the last name. This would be a Bad Idea.
4 Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
Yes and no. As you say, I wouldn't want to create an object called FirstAndLastNames just to be used by one method. But if there was already an object which had basically this information, then it would make perfect sense to use it here.
If I was returning two of the exact same thing, a collection might be appropriate, but in general I would usually build a specialized class to hold exactly what I needed.
And if if you are returning two things today from those two columns, tomorrow you might want a third. Maintaining a custom object is going to be a lot easier than any of the other options.
Use var/out parameters or pass variables by reference, not by value. In Delphi:
function ReturnTwoValues(out Param1: Integer):Integer;
begin
Param1 := 10;
Result := 20;
end;
If you use var instead of out, you can pre-initialize the parameter.
With databases, you could have an out parameter per column and the result of the function would be a boolean indicating if the record is retrieved correctly or not. (Although I would use a single record class to hold the column values.)
As much as it pains me to do it, I find the most readable way to return multiple values in PHP (which is what I work with, mostly) is using a (multi-dimensional) array, like this:
function doStuff($someThing)
{
// do stuff
$status = 1;
$message = 'it worked, good job';
return array('status' => $status, 'message' => $message);
}
Not pretty, but it works and it's not terribly difficult to figure out what's going on.
I generally use tuples. I mainly work in C# and its very easy to design generic tuple constructs. I assume it would be very similar for most languages which have generics. As an aside, 1 is a terrible idea, and 3 only works when you are getting two returns that are the same type unless you work in a language where everything derives from the same basic type (i.e. object). 2 and 4 are also good choices. 2 doesn't introduce any side effects a priori, its just unwieldy.
Use std::vector, QList, or some managed library container to hold however many X you want to return:
QList<X> getMultipleItems()
{
QList<X> returnValue;
for (int i = 0; i < countOfItems; ++i)
{
returnValue.push_back(<your data here>);
}
return returnValue;
}
For the situation you described, pulling two fields from a single table, the appropriate answer is #4 given that two properties (fields) of the same entity (table) will exhibit strong cohesion.
Your concern that "it might be confusing to create a class just for the purpose of a return" is probably not that realistic. If your application is non-trivial you are likely going to need to re-use that class/object elsewhere anyway.
You should also consider whether the design of your method is primarily returning a single value, and you are getting another value for reference along with it, or if you really have a single returnable thing like first name - last name.
For instance, you might have an inventory module that queries the number of widgets you have in inventory. The return value you want to give is the actual number of widgets.. However, you may also want to record how often someone is querying inventory and return the number of queries so far. In that case it can be tempting to return both values together. However, remember that you have class vars availabe for storing data, so you can store an internal query count, and not return it every time, then use a second method call to retrieve the related value. Only group the two values together if they are truly related. If they are not, use separate methods to retrieve them separately.
Haskell also allows multiple return values using built in tuples:
sumAndDifference :: Int -> Int -> (Int, Int)
sumAndDifference x y = (x + y, x - y)
> let (s, d) = sumAndDifference 3 5 in s * d
-16
Being a pure language, options 1 and 2 are not allowed.
Even using a state monad, the return value contains (at least conceptually) a bag of all relevant state, including any changes the function just made. It's just a fancy convention for passing that state through a sequence of operations.
I will usually opt for approach #4 as I prefer the clarity of knowing what the function produces or calculate is it's return value (rather than byref parameters). Also, it lends to a rather "functional" style in program flow.
The disadvantage of option #4 with generic tuple classes is it isn't much better than returning a collection (the only gain is type safety).
public IList CalculateStuffCollection(int arg1, int arg2)
public Tuple<int, int> CalculateStuffType(int arg1, int arg2)
var resultCollection = CalculateStuffCollection(1,2);
var resultTuple = CalculateStuffTuple(1,2);
resultCollection[0] // Was it index 0 or 1 I wanted?
resultTuple.A // Was it A or B I wanted?
I would like a language that allowed me to return an immutable tuple of named variables (similar to a dictionary, but immutable, typesafe and statically checked). But, sadly, such an option isn't available to me in the world of VB.NET, it may be elsewhere.
I dislike option #2 because it breaks that "functional" style and forces you back into a procedural world (when often I don't want to do that just to call a simple method like TryParse).
I have sometimes used continuation-passing style to work around this, passing a function value as an argument, and returning that function call passing the multiple values.
Objects in place of function values in languages without first-class functions.
My choice is #4. Define a reference parameter in your function. That pointer references to a Value Object.
In PHP:
class TwoValuesVO {
public $expectedOne;
public $expectedTwo;
}
/* parameter $_vo references to a TwoValuesVO instance */
function twoValues( & $_vo ) {
$vo->expectedOne = 1;
$vo->expectedTwo = 2;
}
In Java:
class TwoValuesVO {
public int expectedOne;
public int expectedTwo;
}
class TwoValuesTest {
void twoValues( TwoValuesVO vo ) {
vo.expectedOne = 1;
vo.expectedTwo = 2;
}
}