XQuery - calculating number of elements - function

I'm trying to declare a user-defined function in XQuery, that would be passed an element and would return the total number of elements in its tree (meaning itself plus its subtree).
Is this even possible to do in XQuery with a recursive function or will I need another approach?

Yes, this is possible. As this has a smell of homework, I'm not giving a full answer, but the idea on how to do it.
For both cases, you'll have to consider which children to consider while counting. Reading your question, it looks like you're looking for elements only and can safely ignore attributes, comments, text nodes and processing instructions.
Using a Recursive Function
Define a function, which sums up the size of each individual subtree (which you determine by a recursive function call). Something like (this is not XQuery code!):
function subtree_size {
sum(
for each element
return subtree_size(current element)
)
}
Passing all Elements to the count Function
XQuery has a count function, which returns the number of elements passed. There is a very short and rather easy to find XPath expression to return all descendant nodes (including the node itself). Have a look at the axis steps available in XPath.

Related

How looks like an Expression-Tree (when function calls are involved)

I've found many places that shows expression-trees that involve operators (+,-,*, &&, ||, etc). Here is a simple example:
But I can not find an example when functions (with zero or more arguments) are involved.
How would following expression be represented using an Expression-Tree?
mid( "This is a string", 1*2, ceil( 4.2 ) ) == "is i"
Thanks a million in advance.
After weeks of researching, I was not able to find the "official" (academic) answer to this question. So I took my own path and I can tell it works smoothly.
I'm offering it here because so far no one gave an answer: just in the case this could help someone.
By asking this question, I wanted to know if I should place the arguments passed to a function as child nodes of the 'function node' or as a property (data) of the 'function node'.
After evaluating pros and cons of both options, and as nodes in an AST tree can store as many information as you need/want/please (the 2 siblings 'left' and 'right' are just the minimum), I thought this was going to be the easiest approach; it is easy to be implemented and it works perfectly.
This was my choice: place the arguments of the function as data into the 'function node'.
But if any one has a better answer, I beg you to share it here.
It might help to think of an expression tree as already being a way of representing functions applied to a set of arguments. For example, a - node has two children, which you can think of as representing the two ordered inputs to the “minus” function.
With that in mind, you can generalize your expression tree by allowing each node to contain an arbitrary function with one child per argument to the function. For example, if you have a function max that returns the maximum of two values, the max node would have two children. If you have a function median that takes three arguments and returns the median, it would have three children.

acquiring all the ancestors or descendants of a matched node

I need to travel through all the ancestors or descendants of a matched AST node to later use that info to moodify parts of the input source code.
I tried to look for ways to do that. I looked at the getParents member function of the ASTContext class. I could use that to just go up the AST hierarchy to visit all the ancestor nodes of my currently-matched node. but the problem with that is, when i get the parent node, i no longer have the context for that node to try and get its parent. I could try to rebuild the ASTContext for the new node but that seems to be another big task on its own, if possible.
the lowest NodeKind (lowest in the C hierarchy) I'm looking for is a callExpr and the highest I'm looking for is a functionDecl.
how can I obtain all the ancestors or descendants of a matched AST node after the match returns the control to run in MatchCallback?
It may be possible to keep reaching for a parent declaration recursively until you reach TranslationUnitDecl, however, I would instead suggest actually iterating over the declarations in TranslationUnitDecl and working your way toward the FunctionDecl instead.
You can create a recursive function which finds all TagDecl in a translation unit, searches all methods in that class for the FunctionDecl you specify, and also recursively consumes TagDecls within that TagDecl, until you have nothing left to consume.
This would allow you to more easily keep a complete record of the specific AST nodes you want, and would probably be less confusing to write.
However, if you choose to work your way backward you can try something like this (untested)
FunctionDecl *FD;
DeclContext *PC = FD->getParent();
while (!isa<TranslationUnitDecl>(Decl::castFromDeclContext(PC))) {
//consume
PC = PC->getParent();
}
for descendants (children) you'll just have to cast to a type with children and iterate.

How to set the value of a variable in a udf requiring a path-call

I'm trying to use a user-defined function in XSLT that repeatedly calls the value of a certain string. That string is based on the outcome of an XPath expression that doesn't change within the span of a single function call. I thought it would be a good idea to assign it to a variable rather than look it up over and over again.
Unfortunately, at least in Saxon's implementation, you cannot use an XPath expression requiring a node inside a function, even one based on an absolute path, without first using a throw-away line to let the function know you are discussing the root document rather than some other one.
So, for example, the following code throws an error:
<xsl:function name="udf:LeafMatch">
<xsl:param name="sID"></xsl:param>
<xsl:variable name="myLeaf" select="/potato/stem[#sessionID=$sID][scc]/scc/#leafnumber"/>
Normally, the solution is just to first call any global variable to give context. For example, the following works inside of an udf ($root is a variable identified with the root node):
<xsl:for-each select="$root">
<xsl:value-of select="/potato/stem[#sessionID=$sID][scc]/scc/#leafnumber"/>
</xsl:for-each>
But this doesn't work when trying to use Xpath to fix the value of a variable because I'm not allowed to put the expression within a for-each.
I also tried using
<xsl:choose><xsl:when select"$root"><xsl:value-of select="/potato/stem[#sessionID=$sID][scc]/scc/#leafnumber"/></xsl:when></xsl:choose>
to give it context, going on what I saw here:http://www.stylusstudio.com/xsllist/200504/post00240.html
That didn't work either.
FWIW, passing the variable into the function is problematic because the Xpath expression used to define "myleaf" depends on the context node, and I don't know how to get Xpath to call one path based on values in the current context node.
For example, in the code calling this function I have something like:
<xsl:for-each select="/potato/stem[eye]">
<leaf = "{udf:LeafMatch(#sessionID)}"/>
</xsl:for-each>
I'm working in the context of a /potato/stem[eye] node and using the udf to look for a /potato/stem[scc] node that has the same value of #sessionID. I don't know how to reference the value of #sessionID from the current context node in the predicate of an XPath searching for other nodes in a completely different part of the XML tree, so I was using a udf to do that. It was working fine until I decided to try to use a variable for the string rather than having the processor look it up each time.
I was trying to avoid going one level deeper (having my function itself call a named template or putting a named template inside my original for-each and having that named template call a function).
So my questions are:
A. For a user-defined function, how do I set a variable that depends on an XPath expression?
B. Is there a snazzy way in Xpath to use values drawn from the current content node in the predicates of the Xpath expression you are trying to test?
So my questions are:
A. For a user-defined function, how do I set a variable that depends
on an XPath expression?
B. Is there a snazzy way in Xpath to use values drawn from the current
content node in the predicates of the Xpath expression you are trying
to test?
Both questions are quite unclear.
A: I assume you actually mean:
"Inside an xsl:function how do I define a variable that depend on
the context node?"
The answer: You can't. By definition there is no context node within an xsl:function. This is defined by the W3C XSLT 2.0 specification in the following way:
"Within the body of a stylesheet function, the focus is initially
undefined; this means that any attempt to reference the context item,
context position, or context size is a non-recoverable dynamic error.
[XPDY0002]"
You can, however, pass as a parameter the intended context node (or just the document node that must be used as current). Or, alternatively, you may refer to a globally defined variable.
B: This question is completely not understandable:
What is "snazzy"?
What is "current content node"? Please, provide an example of a specific task to be accomplished in the wanted "snazzy" way.

What is the name of a function whose result depends only on its parameters?

I'm writing a toy compiler thingy which can optimise function calls if the result depends only on the values of the arguments. So functions like xor and concatenate depend only on their inputs, calling them with the same input always gives the same output. But functions like time and rand depend on "hidden" program state, and calling them with the same input may give different output. I'm just trying to figure out what the adjective is that distinguishes these two types of function, like "isomorphic" or "re-entrant" or something. Can someone tell me the word I'm looking for?
The term you are looking for is Pure
http://en.wikipedia.org/wiki/Pure_function
I think it's called Pure Function:
In computer programming, a function may be described as pure if both these statements about the function hold:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
I guess you could say the adjective is "pure" if you go by "pure function".
I always learnt that a function whose output is always the same when the arguments are always the same is called "deterministic". Personally, I feel that that is a more descriptive term. I guess a "pure function" is by definition deterministic, and it seems a pure function is also required to not have any side-effects. I assume that that need not be the case for all deterministic functions (as long as the return value is always the same for the same arguments).
Wikipedia link: http://en.wikipedia.org/wiki/Deterministic_algorithm
Quote:
Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states.

Is it any way to implement a linked list with indexed access too?

I'm in the need of sort of a linked list structure, but if it had indexed access too it would be great.
Is it any way to accomplish that?
EDIT: I'm writing in C, but it may be for any language.
One method of achieving your goal is to implement a random or deterministic skip list. On the bottom level - you have your linked list with your items.
In order to get to elements using indexes, you'll need to add information to the inner nodes - of how many nodes are in the low most level, from this node until the next node on this level. This information can be added and maintained in O(logn).
This solution complexity is:
Add, Remove, Go to index, all work in O(logn).
The down side of this solution is that it is much more difficult to implement than the regular linked list. So using a regular linked list, you get Add, Remove in O(1), and Go to index in O(n).
You can probably use a tree for what you are aiming at. Make a binary tree that maintains the weights of each node of the tree (where the weight is equal to the number of nodes attached to that node, including itself). If you have a balancing scheme available for the tree, then insertions are still O(log n), since you only need to add one to the ancestor nodes' weights. Getting a node by index is O(log n), since you need only look at the indices of the ancestors of your desired node and the two children of each of those ancestors.
For achieving array like indexing in languages like C++, Java, Python, one would have to overload the array indexing operator [] for a class which implements the linked list data structure. The implementation would be O(n). In C since operator overloading is not possible, hence one would have to write a function which takes the linked list data structure and a position and returns the corresponding object.
In case a faster order access is required, one would have to use a different data structure like the BTree suggested by jprete or a dynamic array (which automatically grows as and when new elements are added to it). A quick example would be std::vector in C++ standard library.
SQL server row items in the clustered index are arranged like so:
.
/ \
/\ /\
*-*-*-*
The linked list is in the leaves (*-*-*). The linked list is ordered allowing fast directional scanning, and the tree serves as a `road-map' into the linked-list. So you would need a key-value pair for your items and then a data structure that encapsulates the tree and linked list.
so your data structure might look something like this:
struct ll_node
{
kv_pair current;
ll_node * next;
};
struct tree_node
{
value_type value;
short isLeaf;
union
{
tree_node * left_child;
kv_pair * left_leaf;
}
union
{
tree_node * right_child;
kv_pair * right_leaf
}
};
struct indexed_ll
{
tree_node * tree_root;
ll_node * linked_list_tail;
};