How looks like an Expression-Tree (when function calls are involved) - function

I've found many places that shows expression-trees that involve operators (+,-,*, &&, ||, etc). Here is a simple example:
But I can not find an example when functions (with zero or more arguments) are involved.
How would following expression be represented using an Expression-Tree?
mid( "This is a string", 1*2, ceil( 4.2 ) ) == "is i"
Thanks a million in advance.

After weeks of researching, I was not able to find the "official" (academic) answer to this question. So I took my own path and I can tell it works smoothly.
I'm offering it here because so far no one gave an answer: just in the case this could help someone.
By asking this question, I wanted to know if I should place the arguments passed to a function as child nodes of the 'function node' or as a property (data) of the 'function node'.
After evaluating pros and cons of both options, and as nodes in an AST tree can store as many information as you need/want/please (the 2 siblings 'left' and 'right' are just the minimum), I thought this was going to be the easiest approach; it is easy to be implemented and it works perfectly.
This was my choice: place the arguments of the function as data into the 'function node'.
But if any one has a better answer, I beg you to share it here.

It might help to think of an expression tree as already being a way of representing functions applied to a set of arguments. For example, a - node has two children, which you can think of as representing the two ordered inputs to the “minus” function.
With that in mind, you can generalize your expression tree by allowing each node to contain an arbitrary function with one child per argument to the function. For example, if you have a function max that returns the maximum of two values, the max node would have two children. If you have a function median that takes three arguments and returns the median, it would have three children.

Related

XQuery - calculating number of elements

I'm trying to declare a user-defined function in XQuery, that would be passed an element and would return the total number of elements in its tree (meaning itself plus its subtree).
Is this even possible to do in XQuery with a recursive function or will I need another approach?
Yes, this is possible. As this has a smell of homework, I'm not giving a full answer, but the idea on how to do it.
For both cases, you'll have to consider which children to consider while counting. Reading your question, it looks like you're looking for elements only and can safely ignore attributes, comments, text nodes and processing instructions.
Using a Recursive Function
Define a function, which sums up the size of each individual subtree (which you determine by a recursive function call). Something like (this is not XQuery code!):
function subtree_size {
sum(
for each element
return subtree_size(current element)
)
}
Passing all Elements to the count Function
XQuery has a count function, which returns the number of elements passed. There is a very short and rather easy to find XPath expression to return all descendant nodes (including the node itself). Have a look at the axis steps available in XPath.

Recursive Functions - Two Functions or Last Optional Parameter

When writing recursive functions, it sometimes happens that something should happen only on the first pass of the recursive algorithm. When this is true, I have two options,
Have an optional parameter called "first run" which is set to true by default but when called recursively, the argument is false
Have two functions
Which option is preferable? If it is the latter, what should I name these functions? (e.g. if its a flood fill algorithm would I choose FloodFill and FloodFillRecursive?)
Thanks in advance, ell.
I might use two functions, and I would say that the function that will be called should be named FloodFill : the user doesn't need to know how that function is implemented, so it should not be named FloodFillRecursive.
Actually, FloodFillRecursive could be the name of the inner function : the one that contains the implementation, the on that is called by the one called by the user -- as it is that second function that is recursive.
Ideally, that function should not be visible from the users : it should be kind of hidden in your library (be it trully hidden, or using some naming-convention that informs users they should not call it directly).
And, this way, if you change implementation, you will not have your users call a FloodFillRecursive function that might no be recursive anymore.
It would depend really if the function is intended to be usable by 3rd party developers. If it is it might be preferable to use the two functions approach for neatness's sake, with the second function (FloodFillRecursive) private/internal to your library.
If it's not then the optional parameter approach is fine.
Option 2 is better in every case I can think of. This depends on the language you're using, but you're probably going to see significantly more (entirely avoidable) overhead by passing an additional argument every time.
For the naming convention, use a normal name for the outer function (eg FloodFill). For the inner function I'd say FloodFillRecursive or FloodFillInner are good choices.
If the language allows it then in my opinion the best is to have one function with the official "clean" interface, and the using a local function (not visible outside) for the recursion.
For example in Common Lisp
(defun n-queens (n)
(let ((result (list)))
(labels ((place-queen (row free-cols free-diagonals free-counter-diagonals)
...))
(place-queen 0 ...)
result)))
or Python
def n_queens(n):
result = []
def place_queen(row, free_cols, free_diags, free_counter_diags):
...
place_queen(0, ...)
return result
in the above example the recursive functions requires many parameters (e.g. the still free columns, diagonals and counter-diagonals) but the official public function only accepts a parameter and the recursion is handled internally.

Functions in Lua

I am starting to learn Lua from Programming in Lua (2nd edition) I didn't understand the following in the book.
network = {
{name ="grauna", IP="210.26.30.34"},
{name ="araial", IP="210.26.30.23"},
}
If we want to sort the table by field name, the author mentions
table.sort(network, function (a,b) return (a.name > b.name) end }
Whats happening here? What does function (a,b) stand for? Is function a key word or something.
If was playing around with it and created a table order
order={x=1,x=22,x=10} // not sure this is legal
and then did
print (table.sort(order,function(a,b) return (a.x > b.x) end))
I did not get any output. Where am I going wrong?
Thanks
It's an anonymous function that takes two arguments and returns true if the first argument is less than the second argument. table.sort() runs this function for each of the elements that need sorting and compares each element with the previous element.
I think (but I am not sure) that order={x=1,x=22,x=10} has the same meaning in Lua as order={x=10}, a table with one key "x" associated with the value 10. Maybe you meant {{x=1},{x=22},{x=10}} to make an "array" of 3 components, each having the key "x".
To answer the second part of your question: Lua is very small, and doesn't provide a way to print a table directly. If you use a table as a list or array, you can do this:
print(unpack(some_table))
unpack({1, 2, 3}) returns 1, 2, 3. A very useful function.
function in lua is a keyword, similar to lambda in Scheme or Common Lisp (& also Python), or fun in Ocaml, to introduce anonymous functions with closed variables, i.e. closures

Disadvantages of using a lot of parameters

I am re-writing some code to make functional changes and I am stuck at a situation where either I will need to overload a function to accommodate two or three types of parameters (but performing almost identical operations on them) OR use one function with a lot of parameters. Now I am going with the latter option, and I just wanted to know specific disadvantages (if any) of using a function with a lot of parameters (and when I say lot, I mean 15).
I am looking for a general answer, nothing language specific, so I am not mentioning the language here, but just for information, I am using C#.
Thanks
Rishi
The problem with a lot of parameters is that at the place where you call the code it can be difficult to see what the parameters mean:
// Uhh... what?
run(x, y, max_x, true, false, dx * 2, range_start, range_end, 0.01, true);
Some languages solve this by allowing named parameters and optional parameters with sensible defaults.
Another approach is to put your parameters in a parameter object with named members and then pass that single object as the argument to your function. This is a refactoring appraoach called Introduce Parameter Object.
You may also find it useful to put one group of related parameters that belong together into one class, and another group of parameters into a different class.
you may try to think as the one who will use the method.
The best is to have a comprehensible use of each arguments.
if all arguments are not used in all cases, you can :
use optional parameters (c# 4 for example support that)
use struct or class to hold parameters and only fill required properties
refactor your code. I don't know what your code does, but it seems to my eyes a huge number of parameters
If you're trying to write your code "the functional way" you might find "Currying" useful, and create meaningful functor objects that are initialized with just a couple of parameters. If a function takes a lot of parameters, their list may (or should) usually be divided into meaningful chunks, and currying should form a chain of functions with meaningful intent.
So instead of (example of this answer):
run(x, y, max_x, true, false, dx * 2, range_start, range_end, 0.01, true);
you might use
// initialize functors
run_in_userbox = run(x, y, max_x);
run_with_bounds = run_in_userbox(true, false);
iterate_within_bounds = run_with_bounds(dx * 2, range_start, range_end, 0.01);
result = iterate(true); //computation only starts here
I don't know if C# supports this, but that's how the problem is usually solved in functional languages.
The way I normally handle this is to have very small separate methods for each signature needed, but have them call private methods to do the actual work, which as you said is pretty much identical between the use cases.

How do you return two values from a single method?

When your in a situation where you need to return two things in a single method, what is the best approach?
I understand the philosophy that a method should do one thing only, but say you have a method that runs a database select and you need to pull two columns. I'm assuming you only want to traverse through the database result set once, but you want to return two columns worth of data.
The options I have come up with:
Use global variables to hold returns. I personally try and avoid globals where I can.
Pass in two empty variables as parameters then assign the variables inside the method, which now is a void. I don't like the idea of methods that have a side effects.
Return a collection that contains two variables. This can lead to confusing code.
Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
This is not entirely language-agnostic: in Lisp, you can actually return any number of values from a function, including (but not limited to) none, one, two, ...
(defun returns-two-values ()
(values 1 2))
The same thing holds for Scheme and Dylan. In Python, I would actually use a tuple containing 2 values like
def returns_two_values():
return (1, 2)
As others have pointed out, you can return multiple values using the out parameters in C#. In C++, you would use references.
void
returns_two_values(int& v1, int& v2)
{
v1 = 1; v2 = 2;
}
In C, your method would take pointers to locations, where your function should store the result values.
void
returns_two_values(int* v1, int* v2)
{
*v1 = 1; *v2 = 2;
}
For Java, I usually use either a dedicated class, or a pretty generic little helper (currently, there are two in my private "commons" library: Pair<F,S> and Triple<F,S,T>, both nothing more than simple immutable containers for 2 resp. 3 values)
I would create data transfer objects. If it is a group of information (first and last name) I would make a Name class and return that. #4 is the way to go. It seems like more work up front (which it is), but makes it up in clarity later.
If it is a list of records (rows in a database) I would return a Collection of some sort.
I would never use globals unless the app is trivial.
Not my own thoughts (Uncle Bob's):
If there's cohesion between those two variables - I've heard him say, you're missing a class where those two are fields. (He said the same thing about functions with long parameter lists.)
On the other hand, if there is no cohesion, then the function does more than one thing.
I think the most preferred approach is to build a container (may it be a class or a struct - if you don't want to create a separate class for this, struct is the way to go) that will hold all the parameters to be returned.
In the C/C++ world it would actually be quite common to pass two variables by reference (an example, your no. 2).
I think it all depends on the scenario.
Thinking from a C# mentality:
1: I would avoid globals as a solution to this problem, as it is accepted as bad practice.
4: If the two return values are uniquely tied together in some way or form that it could exist as its own object, then you can return a single object that holds the two values. If this object is only being designed and used for this method's return type, then it likely isn't the best solution.
3: A collection is a great option if the returned values are the same type and can be thought of as a collection. However, if the specific example needs 2 items, and each item is it's 'own' thing -> maybe one represents the beginning of something, and the other represents the end, and the returned items are not being used interchangably, then this may not be the best option.
2: I like this option the best, if 4, and 3 do not make sense for your scenario. As stated in 3, if you wanted to get two objects that represent the beginning and end items of something. Then I would use parameters by reference (or out parameters, again, depending on how it's all being used). This way your parameters can explicitly define their purpose: MethodCall(ref object StartObject, ref object EndObject)
Personally I try to use languages that allow functions to return something more than a simple integer value.
First, you should distinguish what you want: an arbitrary-length return or fixed-length return.
If you want your method to return an arbitrary number of arguments, you should stick to collection returns. Because the collections--whatever your language is--are specifically tied to fulfill such a task.
But sometimes you just need to return two values. How does returning two values--when you're sure it's always two values--differ from returning one value? No way it differs, I say! And modern languages, including perl, ruby, C++, python, ocaml etc allow function to return tuples, either built-in or as a third-party syntactic sugar (yes, I'm talking about boost::tuple). It looks like that:
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
Specifying an "out parameter", in my opinion, is overused due to the limitations of older languages and paradigms learned those days. But there still are many cases when it's usable (if your method needs to modify an object passed as parameter, that object being not the class that contains a method).
The conclusion is that there's no generic answer--each situation has its own solution. But one common thing there is: it's not violation of any paradigm that function returns several items. That's a language limitation later somehow transferred to human mind.
Python (like Lisp) also allows you to return any number of
values from a function, including (but not limited to)
none, one, two
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
Some languages make doing #3 native and easy. Example: Perl. "return ($a, $b);". Ditto Lisp.
Barring that, check if your language has a collection suited to the task, ala pair/tuple in C++
Barring that, create a pair/tuple class and/or collection and re-use it, especially if your language supports templating.
If your function has return value(s), it's presumably returning it/them for assignment to either a variable or an implied variable (to perform operations on, for instance.) Anything you can usefully express as a variable (or a testable value) should be fair game, and should dictate what you return.
Your example mentions a row or a set of rows from a SQL query. Then you reasonably should be ready to deal with those as objects or arrays, which suggests an appropriate answer to your question.
When your in a situation where you
need to return two things in a single
method, what is the best approach?
It depends on WHY you are returning two things.
Basically, as everyone here seems to agree, #2 and #4 are the two best answers...
I understand the philosophy that a
method should do one thing only, but
say you have a method that runs a
database select and you need to pull
two columns. I'm assuming you only
want to traverse through the database
result set once, but you want to
return two columns worth of data.
If the two pieces of data from the database are related, such as a customer's First Name and Last Name, I would indeed still consider this to be doing "one thing."
On the other hand, suppose you have come up with a strange SELECT statement that returns your company's gross sales total for a given date, and also reads the name of the customer that placed the first sale for today's date. Here you're doing two unrelated things!
If it's really true that performance of this strange SELECT statement is much better than doing two SELECT statements for the two different pieces of data, and both pieces of data really are needed on a frequent basis (so that the entire application would be slower if you didn't do it that way), then using this strange SELECT might be a good idea - but you better be prepared to demonstrate why your way really makes a difference in perceived response time.
The options I have come up with:
1 Use global variables to hold returns. I personally try and avoid
globals where I can.
There are some situations where creating a global is the right thing to do. But "returning two things from a function" is not one of those situations. Doing it for this purpose is just a Bad Idea.
2 Pass in two empty variables as parameters then assign the variables
inside the method, which now is a
void.
Yes, that's usually the best idea. This is exactly why "by reference" (or "output", depending on which language you're using) parameters exist.
I don't like the idea of methods that have a side effects.
Good theory, but you can take it too far. What would be the point of calling SaveCustomer() if that method didn't have a side-effect of saving the customer's data?
By Reference parameters are understood to be parameters that contain returned data.
3 Return a collection that contains two variables. This can lead to confusing code.
True. It wouldn't make sense, for instance, to return an array where element 0 was the first name and element 1 was the last name. This would be a Bad Idea.
4 Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
Yes and no. As you say, I wouldn't want to create an object called FirstAndLastNames just to be used by one method. But if there was already an object which had basically this information, then it would make perfect sense to use it here.
If I was returning two of the exact same thing, a collection might be appropriate, but in general I would usually build a specialized class to hold exactly what I needed.
And if if you are returning two things today from those two columns, tomorrow you might want a third. Maintaining a custom object is going to be a lot easier than any of the other options.
Use var/out parameters or pass variables by reference, not by value. In Delphi:
function ReturnTwoValues(out Param1: Integer):Integer;
begin
Param1 := 10;
Result := 20;
end;
If you use var instead of out, you can pre-initialize the parameter.
With databases, you could have an out parameter per column and the result of the function would be a boolean indicating if the record is retrieved correctly or not. (Although I would use a single record class to hold the column values.)
As much as it pains me to do it, I find the most readable way to return multiple values in PHP (which is what I work with, mostly) is using a (multi-dimensional) array, like this:
function doStuff($someThing)
{
// do stuff
$status = 1;
$message = 'it worked, good job';
return array('status' => $status, 'message' => $message);
}
Not pretty, but it works and it's not terribly difficult to figure out what's going on.
I generally use tuples. I mainly work in C# and its very easy to design generic tuple constructs. I assume it would be very similar for most languages which have generics. As an aside, 1 is a terrible idea, and 3 only works when you are getting two returns that are the same type unless you work in a language where everything derives from the same basic type (i.e. object). 2 and 4 are also good choices. 2 doesn't introduce any side effects a priori, its just unwieldy.
Use std::vector, QList, or some managed library container to hold however many X you want to return:
QList<X> getMultipleItems()
{
QList<X> returnValue;
for (int i = 0; i < countOfItems; ++i)
{
returnValue.push_back(<your data here>);
}
return returnValue;
}
For the situation you described, pulling two fields from a single table, the appropriate answer is #4 given that two properties (fields) of the same entity (table) will exhibit strong cohesion.
Your concern that "it might be confusing to create a class just for the purpose of a return" is probably not that realistic. If your application is non-trivial you are likely going to need to re-use that class/object elsewhere anyway.
You should also consider whether the design of your method is primarily returning a single value, and you are getting another value for reference along with it, or if you really have a single returnable thing like first name - last name.
For instance, you might have an inventory module that queries the number of widgets you have in inventory. The return value you want to give is the actual number of widgets.. However, you may also want to record how often someone is querying inventory and return the number of queries so far. In that case it can be tempting to return both values together. However, remember that you have class vars availabe for storing data, so you can store an internal query count, and not return it every time, then use a second method call to retrieve the related value. Only group the two values together if they are truly related. If they are not, use separate methods to retrieve them separately.
Haskell also allows multiple return values using built in tuples:
sumAndDifference :: Int -> Int -> (Int, Int)
sumAndDifference x y = (x + y, x - y)
> let (s, d) = sumAndDifference 3 5 in s * d
-16
Being a pure language, options 1 and 2 are not allowed.
Even using a state monad, the return value contains (at least conceptually) a bag of all relevant state, including any changes the function just made. It's just a fancy convention for passing that state through a sequence of operations.
I will usually opt for approach #4 as I prefer the clarity of knowing what the function produces or calculate is it's return value (rather than byref parameters). Also, it lends to a rather "functional" style in program flow.
The disadvantage of option #4 with generic tuple classes is it isn't much better than returning a collection (the only gain is type safety).
public IList CalculateStuffCollection(int arg1, int arg2)
public Tuple<int, int> CalculateStuffType(int arg1, int arg2)
var resultCollection = CalculateStuffCollection(1,2);
var resultTuple = CalculateStuffTuple(1,2);
resultCollection[0] // Was it index 0 or 1 I wanted?
resultTuple.A // Was it A or B I wanted?
I would like a language that allowed me to return an immutable tuple of named variables (similar to a dictionary, but immutable, typesafe and statically checked). But, sadly, such an option isn't available to me in the world of VB.NET, it may be elsewhere.
I dislike option #2 because it breaks that "functional" style and forces you back into a procedural world (when often I don't want to do that just to call a simple method like TryParse).
I have sometimes used continuation-passing style to work around this, passing a function value as an argument, and returning that function call passing the multiple values.
Objects in place of function values in languages without first-class functions.
My choice is #4. Define a reference parameter in your function. That pointer references to a Value Object.
In PHP:
class TwoValuesVO {
public $expectedOne;
public $expectedTwo;
}
/* parameter $_vo references to a TwoValuesVO instance */
function twoValues( & $_vo ) {
$vo->expectedOne = 1;
$vo->expectedTwo = 2;
}
In Java:
class TwoValuesVO {
public int expectedOne;
public int expectedTwo;
}
class TwoValuesTest {
void twoValues( TwoValuesVO vo ) {
vo.expectedOne = 1;
vo.expectedTwo = 2;
}
}