Finding if two strings are equal structurally using scheme - function

Write a Scheme predicate function that tests for the structural equality of two given lists. Two lists are structurally equal if they have the same list structure, although their atoms may be different.
(123) (456) is ok
(1(23))((12)3) is not ok
I have no idea how to do this. Any help would be appreciated.

Here are some hints. This one is a bit repetitive to write, because the question looks like homework I'll let you fill-in the details:
(define (structurally-equal l1 l2)
(cond ( ? ; if both lists are null
#t)
( ? ; if one of the lists is null but the other is not
#f)
( ? ; if the `car` part of both lists is an atom
(structurally-equal (cdr l1) (cdr l2)))
( ? ; if the `car` part of one of the lists is an atom but the other is not
#f)
(else
(and (structurally-equal ? ?) ; recur over the `car` of each list
(structurally-equal ? ?))))) ; recur over the `cdr` of each list

There are two ways you could approach this. The first one uses a function to generate an output that represents the list structure.
Think of a way that you could represent the structure of any list as a unique string or number, such that any lists with identical structure would have the same representation and no other list would generate the same output.
Write a function that analyses any list's structure and generates that output.
Run both lists through the function and compare the output. If the same, they have the same structure.
The second one, which is the approach Oscar has taken, is to recur through both lists at the same time. Here, you pass both lists to one function, which does this:
Is the first element of the first list identical (structurally) to the first element of the second? If not, return false.
Are these first elements lists? If so, return the result of (and (recur on the first element of both lists) (recur on the rest of both lists))
If not, return the result of (recur on the rest of both lists).
The second approach is more efficient in the simple circumstance where you want to compare two lists. It returns as soon as a difference is found, only having to process both lists in their entirety where both lists are, indeed, structurally identical.
If you had a large collection of lists and might want to compare any two at any time, the first approach can be more efficient as you can store the result and thus any list need only be processed once. It also allows you to
Organise your collection of lists by, for example, creating a hash map that groups together all lists with the same structure.
Compare lists for similarity of structure (e.g. do these lists start and/or end with the same structure, even if they differ in the middle?)
I suspect, though, that your homework is best served by the second approach.

Related

Why did the designer make vector, map, and set functions in clojure?

Rich made vector, map, and set functions, while list, and sequence are not functions.
Why cannot all these collections be function to make it consistent?
Further, why don't we make all these compose data as a function which maps position to it's internal data?
If we make all these compose data as function then there will be only function and atom data in clojure. This will minimize the fundamental elements in that language right?
I believe a minimal, best only 2, set of fundamental elements would make the language simpler, more expressive and more flexible. Is this correct?
Vectors, maps, and sets are all associative data structures. Maps are the most obvious; they simply associate arbitrary keys with arbitrary values. A vector can be thought of as a map whose key set must be the set of all nonnegative integers less than the vector's size. Finally, sets can be thought of as maps that map keys to themselves.
It's important to understand that the sequential nature of a vector and the associative nature of a vector are two orthogonal things. It's a data structure that's designed to be good at supporting both abstractions (to some extent; for instance, you can't efficiently insert at the beginning of a vector).
Lists are simpler than vectors; they are finite sequential data structures, nothing more. A list can't efficiently return the element at a particular index, so it doesn't expose that functionality as part of its core interface. Of course, you can get an element of a list by index using nth, but in that case, you're explicitly treating it as a sequence, not as an associative structure.
So to answer your question, the IFn implementations for vectors, maps, and sets are there because of the extremely close relationship between the idea of an associative data structure and the idea of a pure function. Lists and other sequences are not inherently associative, so for consistency, they do not implement IFn.
Elogent's answer is excellent. There is one more reason that it wouldn't make sense for lists to be functions:
Literal lists already have a different, very important role, so they can't also be treated as functions in the way that vectors are.
Let's start with a vector containing two functions, partial and +, and a number, 5. We can treat the vector as a function, as you know, to return the value indexed by its argument:
user=> ([partial + 5] 2)
5
So far, so good. Suppose we want to use a list (partial + 5) in place of the vector, as you suggested, to return the value 5. Will we get an error message? No! But we won't get 5 as the result, either:
user=> ((partial + 5) 2)
7
What happened? (partial + 5) returned a function--the function that adds 5 to its single argument--and then this function was applied to the argument 2.
When a list is evaluated, its first element is evaluated, and should return a function. If the first element is a symbol, it's evaluated, and then the function that's its value is applied to the arguments, which are the other elements of the list. If the first argument of a list is itself a list, then it is evaluated in the same way that it would be evaluated if it were at the top level. The entire expression in that inner list should return a function, which will then be applied to the other elements of the outer list.
Since an inner list that's the first element of list that's being evaluated already has this role, it can't also play the kind of role that vectors that are first elements play.

Sorting by key > 10 integer sequences. with thrust

I want to perform a sort_by_key where I have a single key-sequence
and multiple value sequences.
One usually performs this with
sort_by_key(
key,
key + N,
make_zip_iterator(
make_tuple(x1 , x2 , ...)
)
)
However I want to perform a sort with > 10 sequences each of length N. Thrust does not support
tuples of size >= 10. So is there a way around this ?
Of course one can keep a separate copy of the key vector and perform
sorts on bunches of 10 sequences. But I would like to do everything in a single call.
thrust::tuple is hardcoded to always have 10 elements, so there isn't a direct way to form a zip_iterator from more than ten individual iterators, and therefore no way of sorting more than 10 distinct iterators by key in a single fused operation (and implicitly no way of passing more than 10 iterators into a user functor as well).
If you really can't think of a useful way to combine some of the individual vectors into a single iterator (for example form a vector of tuple values), then one alternative might be to use permutation iterators. If you create an array from a counting iterator and sort that, so something like:
device_vector<int> indices(N);
copy(make_counting_iterator(0), make_counting_iterator(N), indices.begin());
sort_by_key(key, key+N, indices);
indices now holds ordered indices into the vectors you would otherwise have sorted. You can then create a permutation iterator which can be used to "gather" the input data by your key as part of subsequent algorithm calls. You can make as many permutation iterators as needed, and they can be permutations of zip iterators to providing different "views" of the 12 input iterators as you need them in subsequent code.
Actually you may use the simple "scatter" operation. Perform only one "thrust::sort_by_key" operation, then for each data vector apply "thrust::scatter" operation. The values will be distributed to according locations.
thrust::sequence(indices.begin(), indices.end());
thrust::sort_by_key(keyvals.begin(), keyvals.end(), indices.begin());
//now indices keep the locations of the sorted key values
foreach ( ... ) {
thrust::scatter(data.begin(), data.end(), indices.begin(), sorteddata.begin());
}
Gather and scatter operations are quite powerful and opens many opportunities.

Pattern matching with associative and commutative operators

Pattern matching (as found in e.g. Prolog, the ML family languages and various expert system shells) normally operates by matching a query against data element by element in strict order.
In domains like automated theorem proving, however, there is a requirement to take into account that some operators are associative and commutative. Suppose we have data
A or B or C
and query
C or $X
Going by surface syntax this doesn't match, but logically it should match with $X bound to A or B because or is associative and commutative.
Is there any existing system, in any language, that does this sort of thing?
Associative-Commutative pattern matching has been around since 1981 and earlier, and is still a hot topic today.
There are lots of systems that implement this idea and make it useful; it means you can avoid write complicated pattern matches when associtivity or commutativity could be used to make the pattern match. Yes, it can be expensive; better the pattern matcher do this automatically, than you do it badly by hand.
You can see an example in a rewrite system for algebra and simple calculus implemented using our program transformation system. In this example, the symbolic language to be processed is defined by grammar rules, and those rules that have A-C properties are marked. Rewrites on trees produced by parsing the symbolic language are automatically extended to match.
The maude term rewriter implements associative and commutative pattern matching.
http://maude.cs.uiuc.edu/
I've never encountered such a thing, and I just had a more detailed look.
There is a sound computational reason for not implementing this by default - one has to essentially generate all combinations of the input before pattern matching, or you have to generate the full cross-product worth of match clauses.
I suspect that the usual way to implement this would be to simply write both patterns (in the binary case), i.e., have patterns for both C or $X and $X or C.
Depending on the underlying organisation of data (it's usually tuples), this pattern matching would involve rearranging the order of tuple elements, which would be weird (particularly in a strongly typed environment!). If it's lists instead, then you're on even shakier ground.
Incidentally, I suspect that the operation you fundamentally want is disjoint union patterns on sets, e.g.:
foo (Or ({C} disjointUnion {X})) = ...
The only programming environment I've seen that deals with sets in any detail would be Isabelle/HOL, and I'm still not sure that you can construct pattern matches over them.
EDIT: It looks like Isabelle's function functionality (rather than fun) will let you define complex non-constructor patterns, except then you have to prove that they are used consistently, and you can't use the code generator anymore.
EDIT 2: The way I implemented similar functionality over n commutative, associative and transitive operators was this:
My terms were of the form A | B | C | D, while queries were of the form B | C | $X, where $X was permitted to match zero or more things. I pre-sorted these using lexographic ordering, so that variables always occurred in the last position.
First, you construct all pairwise matches, ignoring variables for now, and recording those that match according to your rules.
{ (B,B), (C,C) }
If you treat this as a bipartite graph, then you are essentially doing a perfect marriage problem. There exist fast algorithms for finding these.
Assuming you find one, then you gather up everything that does not appear on the left-hand side of your relation (in this example, A and D), and you stuff them into the variable $X, and your match is complete. Obviously you can fail at any stage here, but this will mostly happen if there is no variable free on the RHS, or if there exists a constructor on the LHS that is not matched by anything (preventing you from finding a perfect match).
Sorry if this is a bit muddled. It's been a while since I wrote this code, but I hope this helps you, even a little bit!
For the record, this might not be a good approach in all cases. I had very complex notions of 'match' on subterms (i.e., not simple equality), and so building sets or anything would not have worked. Maybe that'll work in your case though and you can compute disjoint unions directly.

Examples of monoids/semigroups in programming

It is well-known that monoids are stunningly ubiquitous in programing. They are so ubiquitous and so useful that I, as a 'hobby project', am working on a system that is completely based on their properties (distributed data aggregation). To make the system useful I need useful monoids :)
I already know of these:
Numeric or matrix sum
Numeric or matrix product
Minimum or maximum under a total order with a top or bottom element (more generally, join or meet in a bounded lattice, or even more generally, product or coproduct in a category)
Set union
Map union where conflicting values are joined using a monoid
Intersection of subsets of a finite set (or just set intersection if we speak about semigroups)
Intersection of maps with a bounded key domain (same here)
Merge of sorted sequences, perhaps with joining key-equal values in a different monoid/semigroup
Bounded merge of sorted lists (same as above, but we take the top N of the result)
Cartesian product of two monoids or semigroups
List concatenation
Endomorphism composition.
Now, let us define a quasi-property of an operation as a property that holds up to an equivalence relation. For example, list concatenation is quasi-commutative if we consider lists of equal length or with identical contents up to permutation to be equivalent.
Here are some quasi-monoids and quasi-commutative monoids and semigroups:
Any (a+b = a or b, if we consider all elements of the carrier set to be equivalent)
Any satisfying predicate (a+b = the one of a and b that is non-null and satisfies some predicate P, if none does then null; if we consider all elements satisfying P equivalent)
Bounded mixture of random samples (xs+ys = a random sample of size N from the concatenation of xs and ys; if we consider any two samples with the same distribution as the whole dataset to be equivalent)
Bounded mixture of weighted random samples
Let's call it "topological merge": given two acyclic and non-contradicting dependency graphs, a graph that contains all the dependencies specified in both. For example, list "concatenation" that may produce any permutation in which elements of each list follow in order (say, 123+456=142356).
Which others do exist?
Quotient monoid is another way to form monoids (quasimonoids?): given monoid M and an equivalence relation ~ compatible with multiplication, it gives another monoid. For example:
finite multisets with union: if A* is a free monoid (lists with concatenation), ~ is "is a permutation of" relation, then A*/~ is a free commutative monoid.
finite sets with union: If ~ is modified to disregard count of elements (so "aa" ~ "a") then A*/~ is a free commutative idempotent monoid.
syntactic monoid: Any regular language gives rise to syntactic monoid that is quotient of A* by "indistinguishability by language" relation. Here is a finger tree implementation of this idea. For example, the language {a3n:n natural} has Z3 as the syntactic monoid.
Quotient monoids automatically come with homomorphism M -> M/~ that is surjective.
A "dual" construction are submonoids. They come with homomorphism A -> M that is injective.
Yet another construction on monoids is tensor product.
Monoids allow exponentation by squaring in O(log n) and fast parallel prefix sums computation. Also they are used in Writer monad.
The Haskell standard library is alternately praised and attacked for its use of the actual mathematical terms for its type classes. (In my opinion it's a good thing, since without it I'd never even know what a monoid is!). In any case, you might check out http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Monoid.html for a few more examples:
the dual of any monoid is a monoid: given a+b, define a new operation ++ with a++b = b+a
conjunction and disjunction of booleans
over the Maybe monad (aka "option" in Ocaml), first and last. That is,first (Just a) b = Just a
first Nothing b = band likewise for last
The latter is just the tip of the iceberg of a whole family of monoids related to monads and arrows, but I can't really wrap my head around these (other than simply monadic endomorphisms). But a google search on monads monoids turns up quite a bit.
A really useful example of a commutative monoid is unification in logic and constraint languages. See section 2.8.2.2 of 'Concepts, Techniques and Models of Computer Programming' for a precise definition of a possible unification algorithm.
Good luck with your language! I'm doing something similar with a parallel language, using monoids to merge subresults from parallel computations.
Arbitrary length Roman numeral value computation.
https://gist.github.com/4542999

What's a good way to structure variable nested loops?

Suppose you're working in a language with variable length arrays (e.g. with A[i] for all i in 1..A.length) and have to write a routine that takes n (n : 1..8) variable length arrays of items in a variable length array of length n, and needs to call a procedure with every possible length n array of items where the first is chosen from the first array, the second is chosen from the second array, and so forth.
If you want something concrete to visualize, imagine that your routine has to take data like:
[ [ 'top hat', 'bowler', 'derby' ], [ 'bow tie', 'cravat', 'ascot', 'bolo'] ... ['jackboots','galoshes','sneakers','slippers']]
and make the following procedure calls (in any order):
try_on ['top hat', 'bow tie', ... 'jackboots']
try_on ['top hat', 'bow tie', ... 'galoshes']
:
try_on ['derby','bolo',...'slippers']
This is sometimes called a chinese menu problem, and for fixed n can be coded quite simply (e.g. for n = 3, in pseudo code)
procedure register_combination( items : array [1..3] of vararray of An_item)
for each i1 from items[1]
for each i2 from items[2]
for each i3 from items[3]
register( [ii,i2,i3] )
But what if n can vary, giving a signature like:
procedure register_combination( items : vararray of vararray of An_item)
The code as written contained an ugly case statement, which I replaced with a much simpler solution. But I'm not sure it's the best (and it's surely not the only) way to refactor this.
How would you do it? Clever and surprising are good, but clear and maintainable are better--I'm just passing through this code and don't want to get called back. Concise, clear and clever would be ideal.
Edit: I'll post my solution later today, after others have had a chance to respond.
Teaser: I tried to sell a recursive solution, but they wouldn't go for it, so I had to stick to writing fortran in a HLL.
The answer I went with, posted below.
Either the recursive algorithm
procedure register_combination( items )
register_combination2( [], items [1:] )
procedure register_combination2( head, items)
if items == []
print head
else
for i in items[0]
register_combination2( head ++ i, items [1:] )
or the same with tail calls optimised out, using an array for the indices, and incrementing the last index until it reaches the length of the corresponding array, then carrying the increment up.
Recursion.
Or, better yet, trying to eliminate recursion using stack-like structures and while statements.
For your problem you stated (calling a function with variable arguments) it depends entirely on the programming language you're coding in; many of them allow for passing variable arguments.
Since they were opposed to recursion (don't ask) and I was opposed to messy case statements (which, as it turned out, were hiding a bug) I went with this:
procedure register_combination( items : vararray of vararray of An_item)
possible_combinations = 1
for each item_list in items
possible_combinations = possible_combinations * item_list.length
for i from 0 to possible_combinations-1
index = i
this_combination = []
for each item_list in items
item_from_this_list = index mod item_list.length
this_combination << item_list[item_from_this_list]
index = index div item_list.length
register_combination(this_combination)
Basically, I figure out how many combinations there are, assign each one a number, and then loop through the number producing the corresponding combination. Not a new trick, I suspect, but one worth knowing.
It's shorter, works for any practical combination of list lengths (if there are over 2^60 combinations, they have other problems), isn't recursive, and doesn't have the bug.