How to test if collection contains all elements of other collection - ceylon

With a Set in Ceylon it is straightforward to determine if one collection is a superset of the other. It's just first.superset(second). What's the best way to do the equivalent for an Iterable, List, or Sequential using multiset (or bag) semantics? For example something like the pseudocode below:
{'a', 'b', 'b', 'c'}.containsAll({'b', 'a'}) // Should be true
{'a', 'b', 'b', 'c'}.containsAll({'a', 'a'}) // Should be false

There is Category.containsEvery, which is inherited by Iterable. It checks for each element of the parameter whether it is contained in the receiver, so that bigger.containsEvery(smaller) is equivalent to this:
smaller.every(bigger.contains)
(Note that it is swapped around.) The expression in the brackets here is a method reference, we could also write this expanded with a lambda:
smaller.every(o => bigger.contains(o))
So in your example:
print({'a', 'b', 'b'}.containsEvery({'b', 'a'})); // Should be true
print({'a', 'b', 'b'}.containsEvery({'a', 'a'})); // Should be false
... actually, those both return true. Why do you think the latter one is false?
Did you think of multiset semantics (i.e. the number of occurrences in the "superset" iterable need to be at least as much as the smaller one)? Or do you want a sublist? Or do you just want to know whether the second iterable is at the start of the first (startswith)?
I don't know about any multiset implementation for Ceylon (I found a multimap, though). If you are running on the JVM, you can use any Java one, like from Guava (though that also doesn't have a "contains all with multiples" function, as far as I can see).
For small iterables, you can use .frequencies() and then compare the numbers:
Boolean isSuperMultiset<Element>({Element*} bigger,
{Element*} smaller) =>
let (bigFreq = bigger.frequencies())
every({ for(key->count in smaller.frequencies())
count <= (bigFreq[key] else 0) })
For sublist semantics, the SearchableList interface has the includes method, which checks whether another list is a sublist. (It is not implemented by many classes, though, you would need to convert your first iterable into an Array, assuming it is not a String/StringBuilder.)
For startsWith semantics, you could convert both to lists and use then List.startsWith. There should be a more efficient way of doing that (you just could go through both iterators in parallel).
There is corresponding, but it just stops after the shorter one ends (i.e. it answers the question "does any of those two iterables start with the other", without telling which one is the longer one). Same for a bunch of other pair related functions in ceylon.language.
If you know the length of both of the Iterables (or are confident that .size is fast), that should solve the issue:
Boolean startsWith<Element>({Element*}longer, {Element*}shorter) =>
shorter.size <= longer.size &&
corresponding(longer, shorter);

If you have two Sequentials, then you can remove each right-hand character one at a time from the left-hand sequence until you either remove them all or fail to remove one of them.
Boolean containsAll<Element>([Element*] collection, [Element*] other)
given Element satisfies Object {
variable value remaining = collection;
for (element1 in other) {
value position = remaining.locate((element2) => element1 == element2);
if (exists position) {
remaining = remaining.initial(position.key).append(remaining.spanFrom(position.key + 1));
} else {
// Element was not found in remaining; terminate early
return false;
}
}
// All elements were found
return true;
}
print(containsAll(['a', 'b', 'b', 'c'], ['a', 'b']));
print(containsAll(['a', 'b', 'b', 'c'], ['a', 'a']));
Append only exists on Sequential so it won't work on just a List or an Iterable.

The containsEvery function should do what you want (try it!). Alternatively, you can also turn both streams into sets using the set function (try it!), or use every and contains (try it!).

Related

synthesizing long parameter strings

When consuming a JSON string, the parameters can be deeply nested, making reading/checking tedious:
update(capture_created: params[:data][:object][:created], capture_currency: params[:data][:object][:currency]
...[...] and so on...
In what way can a node params[:data][:object] be represented only once and be thus able to handle the child values as a parameter?
There are a few things you can.
You could grab the inner hash in a local variable as dbugger mentioned:
p = params[:data][:object]
update(capture_created: p[:created], capture_currency: p[:currency], ...)
Or you could use #tap or #then (depending on what return value you want from the expression):
# This evaluates to params[:data][:object]
params[:data][:object].tap do |p|
update(capture_created: p[:created], capture_currency: p[:currency], ...)
end
# This evaluates to whatever update returns
params[:data][:object].then do |p|
update(capture_created: p[:created], capture_currency: p[:currency], ...)
end
If the keys in the nested hash only need to be consistently renamed (i.e. add a "capture_" prefix) then #transform_keys:
update(params[:data][:object].transform_keys { |k| "capture_#{k}" })
is an option. String keys are fine with an ActiveRecord #update call but you could get symbols if you really want them:
update(params[:data][:object].transform_keys { |k| :"capture_#{k}" })
You might want to include a Hash#slice call if you want to ensure that you're only accessing certain keys:
update(params[:data][:object].slice(:created, :currency, ...).transform_keys { |k| :"capture_#{k}" })

Initialising Sequential values with for loop?

Is there any way to initialize a Sequential value not in one fellow swoop?
Like, can I declare it, then use a for loop to populate it, step by step?
As this could all happen inside a class body, the true immutability of the Sequential value could then kick in once the class instance construction phase has been completed.
Example:
Sequential<String> strSeq;
for (i in span(0,10)) {
strSeq[i] = "hello";
}
This code doesn't work, as I get this error:
Error:(12, 9) ceylon: illegal receiving type for index expression:
'Sequential' is not a subtype of 'KeyedCorrespondenceMutator' or
'IndexedCorrespondenceMutator'
So what I can conclude is that sequences must be assigned in one statement, right?
Yes, several language guarantees hinge on the immutability of sequential objects, so that immutability must be guaranteed by the language – it can’t just trust you that you won’t mutate it after the initialization is done :)
Typically, what you do in this situation is construct some sort of collection (e. g. an ArrayList from ceylon.collection), mutate it however you want, and then take its .sequence() when you’re done.
Your specific case can also be written as a comprehension in a sequential literal:
String[] strSeq = [for (i in 0..10) "hello"];
The square brackets used to create a sequence literal accept not only a comma-separated list of values, but also a for-comprehension:
String[] strSeq = [for (i in 0..10) "hello"];
You can also do both at the same time, as long as the for-comprehension comes last:
String[] strSeq = ["hello", "hello", for (i in 0..8) "hello"];
In this specific case, you could also do this:
String[] strSeq = ["hello"].repeat(11);
You can also get a sequence of sequences via nesting:
String[][] strSeqSeq = [for (i in 0..2) [for (j in 0..2) "hello"]];
And you can do the cartesian product (notice that the nested for-comprehension here isn't in square brackets):
[Integer, Character][] pairs = [for (i in 0..2) for (j in "abc") [i, j]];
Foo[] is an abbreviation for Sequential<Foo>, and x..y translates to span(x, y).
If you know upfront the size of the sequence you want to create, then a very efficient way is to use an Array:
value array = Array.ofSize(11, "");
for (i in 0:11) {
array[i] = "hello";
}
String[] strSeq = array.sequence();
On the other hand, if you don't know the size upfront, then, as described by Lucas, you need to use either:
a comprehension, or
some sort of growable array, like ArrayList.

golang return multiple values issue

I was wondering why this is valid go code:
func FindUserInfo(id string) (Info, bool) {
it, present := all[id]
return it, present
}
but this isn't
func FindUserInfo(id string) (Info, bool) {
return all[id]
}
is there a way to avoid the temporary variables?
To elaborate on my comment, the Effective Go mentions that the multi-value assignment from accessing a map key is called the "comma ok" pattern.
Sometimes you need to distinguish a missing entry from a zero value. Is there an entry for "UTC" or is that the empty string because it's not in the map at all? You can discriminate with a form of multiple assignment.
var seconds int
var ok bool
seconds, ok = timeZone[tz]
For obvious reasons this is called the “comma ok” idiom. In this example, if tz is present, seconds will be set appropriately and ok will be true; if not, seconds will be set to zero and ok will be false.
Playground demonstrating this
We can see that this differs from calling a regular function where the compiler would tell you that something is wrong:
package main
import "fmt"
func multiValueReturn() (int, int) {
return 0, 0
}
func main() {
fmt.Println(multiValueReturn)
asgn1, _ := multiValueReturn()
asgn2 := multiValueReturn()
}
On the playground this will output
# command-line-arguments
/tmp/sandbox592492597/main.go:14: multiple-value multiValueReturn() in single-value context
This gives us a hint that it may be something the compiler is doing. Searching the source code for "commaOk" gives us a few places to look, including types.unpack
At the time of writing this it this the method's godoc reads:
// unpack takes a getter get and a number of operands n. If n == 1, unpack
// calls the incoming getter for the first operand. If that operand is
// invalid, unpack returns (nil, 0, false). Otherwise, if that operand is a
// function call, or a comma-ok expression and allowCommaOk is set, the result
// is a new getter and operand count providing access to the function results,
// or comma-ok values, respectively. The third result value reports if it
// is indeed the comma-ok case. In all other cases, the incoming getter and
// operand count are returned unchanged, and the third result value is false.
//
// In other words, if there's exactly one operand that - after type-checking
// by calling get - stands for multiple operands, the resulting getter provides
// access to those operands instead.
//
// If the returned getter is called at most once for a given operand index i
// (including i == 0), that operand is guaranteed to cause only one call of
// the incoming getter with that i.
//
The key bits of this being that this method appears to determine whether or not something is actually a "comma ok" case.
Digging into that method tells us that it will check to see if the mode of the operands is indexing a map or if the mode is set to commaok (where this is defined does give us many hints on when it's used, but searching the source for assignments to commaok we can see it's used when getting a value from a channel and type assertions). Remember the bolded bit for later!
if x0.mode == mapindex || x0.mode == commaok {
// comma-ok value
if allowCommaOk {
a := [2]Type{x0.typ, Typ[UntypedBool]}
return func(x *operand, i int) {
x.mode = value
x.expr = x0.expr
x.typ = a[i]
}, 2, true
}
x0.mode = value
}
allowCommaOk is a parameter to the function. Checking out where unpack is called in that file we can see that all callers pass false as an argument. Searching the rest of the repository leads us to assignments.go in the Checker.initVars() method.
l := len(lhs)
get, r, commaOk := unpack(func(x *operand, i int) { check.expr(x, rhs[i]) }, len(rhs), l == 2 && !returnPos.IsValid())
Since it seems that we can only use the "comma ok" pattern to get two return values when doing a multi-value assignment this seems like the right place to look! In the above code the length of the left hand side is checked, and when unpack is called the allowCommaOk parameter is the result of l == 2 && !returnPos.IsValid(). The !returnPos.IsValid() is somewhat confusing here as that would mean that the position has no file or line information associated with it, but we'll just ignore that.
Further down in that method we've got:
var x operand
if commaOk {
var a [2]Type
for i := range a {
get(&x, i)
a[i] = check.initVar(lhs[i], &x, returnPos.IsValid())
}
check.recordCommaOkTypes(rhs[0], a)
return
}
So what does all of this tell us?
Since the unpack method takes an allowCommaOk parameter that's hardcoded to false everywhere except in assignment.go's Checker.initVars() method, we can probably assume that you will only ever get two values when doing an assignment and have two variables on the left-hand side.
The unpack method will determine whether or not you actually do get an ok value in return by checking if you are indexing a slice, grabbing a value from a channel, or doing a type assertion
Since you can only get the ok value when doing an assignment it looks like in your specific case you will always need to use variables
You may save a couple of key strokes by using named returns:
func FindUserInfo(id string) (i Info, ok bool) {
i, ok = all[id]
return
}
But apart from that, I don't think what you want is possible.
Simply put: the reason why your second example isn't valid Go code is because the language specification says so. ;)
Indexing a map only yields a secondary value in an assignment to two variables. Return statement is not an assignment.
An index expression on a map a of type map[K]V used in an assignment or initialization of the special form
v, ok = a[x]
v, ok := a[x]
var v, ok = a[x]
yields an additional untyped boolean value. The value of ok is true if the key x is present in the map, and false otherwise.
Furthermore, indexing a map is not a "single call to a multi-valued function", which is one of the three ways to return values from a function (the second one, the other two not being relevant here):
There are three ways to return values from a function with a result type:
The return value or values may be explicitly listed in the "return" statement. Each expression must be single-valued and assignable to the corresponding element of the function's result type.
The expression list in the "return" statement may be a single call to a multi-valued function. The effect is as if each value returned from that function were assigned to a temporary variable with the type of the respective value, followed by a "return" statement listing these variables, at which point the rules of the previous case apply.
The expression list may be empty if the function's result type specifies names for its result parameters. The result parameters act as ordinary local variables and the function may assign values to them as necessary. The "return" statement returns the values of these variables.
As for your actual question: the only way to avoid temporary variables would be using non-temporary variables, but usually that would be quite unwise - and probably not much of an optimization even when safe.
So, why doesn't the language specification allow this kind of special use of map indexing (or type assertion or channel receive, both of which can also utilize the "comma ok" idiom) in return statements? That's a good question. My guess: to keep the language specification simple.
I'm no Go expert but I believe you are getting compile time error when you are trying to return the array i.e. return all[id]. The reason could be because the functions return type is specially mentioned as (Info, bool) and when you are doing return all[id] it can't map the return type of all[id] to (Info, bool).
However the solution mentioned above, the variables being returned i and ok are the same that are mentioned in the return type of the function (i Info, ok bool) and hence the compiler knows what it's returning as opposed to just doing (i Info, ok bool).
By default, maps in golang return a single value when accessing a key
https://blog.golang.org/go-maps-in-action
Hence, return all[id] won't compile for a function that expects 2 return values.

How to apply key function on tuple elems in sorted()?

We know both of this works for sorted():
sorted(['second', 'first', 'third'])
sorted([('first','second'), ('second', 'first'), ('first', 'third')])
By sorting the second one, the tuples are compared lexicographically; the first items are compared; if they are the same then the second items are compared, and so on.
But how to apply a key function on all the individual strings (or anything else there) for sorted which works for both containers and works recursively in the second case? Let's say func converts 'first' to 3, 'second' to 1 and 'third' to 2. I want this result:
['second', 'third', 'first']
[('second', 'first'), ('first','second'), ('first', 'third')]
I made this function to use as key but I dont like typechecking in it since it applies func only on strings which is not a general solution:
def recursively_apply_func_on_strings(target, func,
fargs=(), fkwargs={}):
if isinstance(target, str):
return func(target, *fargs, **fkwargs)
result, f = [], recursively_apply_func_on_strings
for elem in target:
result.append(f(elem, func, fargs, fkwargs))
return tuple(result)
sorted(sequence, key=lambda x: recursively_apply_string_func(x, func))
Is there a cleaner way to do this?
Well, despite my comment saying otherwise, I think there are a few possible ways to improve things.
One idea is to make your function a key-function factory. This way you won't need a lambda to apply it with extra arguments in your sorted call.
Another idea is to apply func to all non-iterable values (plus strings), using the abstract Iterable type from the collections module to test against.
Here's some code:
from collections import Iterable
def recursive_key(func, fargs=(), fkwargs={}):
def key_func(target):
if isinstance(target, str) or not isinstance(target, Iterable):
return func(target, *fargs, **fkwargs)
return tuple(key_func(item) for item in target)
return key_func
You'd call it like this (sorting by hexidecimal integer value, rather than string value):
sorted([('a', 'F'), ('A', 'd')], key=recursive_key(int, (16,)))
Note that we're calling recursive_key and it's return value (a.k.a. key_func) is what is being passed as the key parameter to sorted.

Best way to cache results of method with multiple parameters - Object as key in Dictionary?

At the beginning of a method I want to check if the method is called with these exact parameters before, and if so, return the result that was returned back then.
At first, with one parameter, I used a Dictionary, but now I need to check 3 parameters (a String, an Object and a boolean).
I tried making a custom Object like so:
var cacheKey:Object = { identifier:identifier, type:type, someBoolean:someBoolean };
//if key already exists, return it (not working)
if (resultCache[cacheKey]) return resultCache[cacheKey];
//else: create result ...
//and save it in the cache
resultCache[cacheKey] = result;
But this doesn't work, because the seccond time the function is called, the new cacheKey is not the same object as the first, even though it's properties are the same.
So my question is: is there a datatype that will check the properties of the object used as key for a matching key?
And what else is my best option? Create a cache for the keys as well? :/
Note there are two aspects to the technical solution: equality comparison and indexing.
The Cliff Notes version:
It's easy to do custom equality comparison
In order to perform indexing, you need to know more than whether one object is equal to another -- you need to know which is object is "bigger" than the other.
If all of your properties are primitives you should squash them into a single string and use an Object to keep track of them (NOT a Dictionary).
If you need to compare some of the individual properties for reference equality you're going to have a write a function to determine which set of properties is bigger than the other, and then make your own collection class that uses the output of the comparison function to implement its own a binary search tree based indexing.
If the number of unique sets of arguments is in the several hundreds or less AND you do need reference comparison for your Object argument, just use an Array and the some method to do a naive comparison to all cached keys. Only you know how expensive your actual method is, so it's up to you to decide what lookup cost (which depends on the number of unique arguments provided to the function) is acceptable.
Equality comparison
To address equality comparison it is easy enough to write some code to compare objects for the values of their properties, rather than for reference equality. The following function enforces strict set comparison, so that both objects must contain exactly the same properties (no additional properties on either object allowed) with the same values:
public static propsEqual(obj1:Object, obj2:Object):Boolean {
for(key1:* in obj1) {
if(obj2[key1] === undefined)
return false;
if(obj2[key1] != obj2[key1])
return false;
}
for(key2:* in obj2)
if(obj1[key2] === undefined)
return false;
return true;
}
You could speed it up by eliminating the second for loop with the tradeoff that {A:1, B:2} will be deemed equal to {A:1, B:2, C:'An extra property'}.
Indexing
The problem with this in your case is that you lose the indexing that a Dictionary provides for reference equality or that an Object provides for string keys. You would have to compare each new set of function arguments to the entire list of previously seen arguments, such as using Array.some. I use the field currentArgs and the method to avoid generating a new closure every time.
private var cachedArgs:Array = [];
private var currentArgs:Object;
function yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
currentArgs = { stringArg:stringArg, objArg:objArg, boolArg:boolArg };
var iveSeenThisBefore:Boolean = cachedArgs.some(compareToCurrent);
if(!iveSeenThisBefore)
cachedArgs.push(currentArgs);
}
function compareToCurrent(obj:Object):Boolean {
return someUtil.propsEqual(obj, currentArgs);
}
This means comparison will be O(n) time, where n is the ever increasing number of unique sets of function arguments.
If all the arguments to your function are primitive, see the very similar question In AS3, where do you draw the line between Dictionary and ArrayCollection?. The title doesn't sound very similar but the solution in the accepted answer (yes I wrote it) addresses the exact same techinical issue -- using multiple primitive values as a single compound key. The basic gist in your case would be:
private var cachedArgs:Object = {};
function yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
var argKey:String = stringArg + objArg.toString() + (boolArg ? 'T' : 'F');
if(cachedArgs[argKey] === undefined)
cachedArgs[argKey] = _yourMethod(stringArg, objArg, boolArg);
return cachedArgs[argKey];
}
private function _yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
// Do stuff
return something;
}
If you really need to determine which reference is "bigger" than another (as the Dictionary does internally) you're going to have to wade into some ugly stuff, since Adobe has not yet provided any API to retrieve the "value" / "address" of a reference. The best thing I've found so far is this interesting hack: How can I get an instance's "memory location" in ActionScript?. Without doing a bunch of performance tests I don't know if using this hack to compare references will kill the advantages gained by binary search tree indexnig. Naturally it would depend on the number of keys.