F# Random elements from a set<string> - function

I'm working on a project that requires me to write a function that selects a designated number of random elements from a set. Then map those elements to a variable for later comparison.
So in my scenario I have to select 5% of any given set.
let rec randomSet (a:Set<string>) =
let setLength = (a.Count / 100) * 5
let list = []
let rand = System.Random
if set.Length <> setLength then
// some code will go here
randomSet setLength eIDS
else
set
^Please criticize my code, I've only being coding in F# for a week.
I've tried to do it recursively but I have a feeling that it's the wrong way to go. I have tried other methods but they use the .take function, and thus the returned collection is the same every time.
Any ideas? I'm not after 1 element from a set, I'm after 5% of any set that's thrown at it.
This is not the same question as this :How can I select a random value from a list using F#
If you think it is, please explain.

There are multiple ways of doing this. Depending on the number of elements in the input and the number of items you want to pick, different strategy might be more efficient.
Probably the simplest way is to sort the input by a random number and then use take to get the required number of elements:
let data = [| 0 .. 1000 |]
let rnd = System.Random()
data
|> Seq.sortBy (fun _ -> rnd.Next())
|> Seq.take 50
This will randomly sort the sequence (which may be slow for large sequences), but then it takes precisely the number of elements you want (unlike Mark's solution, which will return roughly 5% of items).
If you wanted to select small number from a large list, it might be better to randomly generate indices (making sure that there are no duplicates) and then doing direct lookup based on the indices.

Since Set<'a> implements Seq<'a>, this question is, in fact, a duplicate of How can I select a random value from a list using F# All you'd need to do is to shuffle the set, take the first 5% elements, and put it back into a set.
Just for the fun of it, though, here's another solution. If you need to pick 5%, then first define a predicate that returns true only 5% of the times it's called:
let r = System.Random ()
let fivePercent _ = r.NextDouble () < 0.05
You can now filter your set using that predicate:
let randomlySelectedSubset = stringSet |> Seq.filter fivePercent |> set

Related

F#: How to Call a function with Argument Byref Int

I have this code:
let sumfunc(n: int byref) =
let mutable s = 0
while n >= 1 do
s <- n + (n-1)
n <- n-1
printfn "%i" s
sumfunc 6
I get the error:
(8,10): error FS0001: This expression was expected to have type
'byref<int>'
but here has type
'int'
So from that I can tell what the problem is but I just dont know how to solve it. I guess I need to specify the number 6 to be a byref<int> somehow. I just dont know how. My main goal here is to make n or the function argument mutable so I can change and use its value inside the function.
Good for you for being upfront about this being a school assignment, and for doing the work yourself instead of just asking a question that boils down to "Please do my homework for me". Because you were honest about it, I'm going to give you a more detailed answer than I would have otherwise.
First, that seems to be a very strange assignment. Using a while loop and just a single local variable is leading you down the path of re-using the n parameter, which is a very bad idea. As a general rule, a function should never modify values outside of itself — and that's what you're trying to do by using a byref parameter. Once you're experienced enough to know why byref is a bad idea most of the time, you're experienced enough to know why it might — MIGHT — be necessary some of the time. But let me show you why it's a bad idea, by using the code that s952163 wrote:
let sumfunc2 (n: int byref) =
let mutable s = 0
while n >= 1 do
s <- n + (n - 1)
n <- n-1
printfn "%i" s
let t = ref 6
printfn "The value of t is %d" t.contents
sumfunc t
printfn "The value of t is %d" t.contents
This outputs:
The value of t is 7
13
11
9
7
5
3
1
The value of t is 0
Were you expecting that? Were you expecting the value of t to change just because you passed it to a function? You shouldn't. You really, REALLY shouldn't. Functions should, as far as possible, be "pure" -- a "pure" function, in programming terminology, is one that doesn't modify anything outside itself -- and therefore, if you run it twice with the same input, it should produce the same output every time.
I'll give you a way to solve this soon, but I'm going to post what I've written so far right now so that you see it.
UPDATE: Now, here's a better way to solve it. First, has your teacher covered recursion yet? If he hasn't, then here's a brief summary: functions can call themselves, and that's a very useful technique for solving all sorts of problems. If you're writing a recursive function, you need to add the rec keyword immediately after let, like so:
let rec sumExampleFromStackOverflow n =
if n <= 0 then
0
else
n + sumExampleFromStackOverflow (n-1)
let t = 7
printfn "The value of t is %d" t
printfn "The sum of 1 through t is %d" (sumExampleFromStackOverflow t)
printfn "The value of t is %d" t
Note how I didn't need to make t mutable this time. In fact, I could have just called sumExampleFromStackOverflow 7 and it would have worked.
Now, this doesn't use a while loop, so it might not be what your teacher is looking for. And I see that s952163 has just updated his answer with a different solution. But you should really get used to the idea of recursion as soon as you can, because breaking the problem down into individual steps using recursion is a really powerful technique for solving a lot of problems in F#. So even though this isn't the answer you're looking for right now, it is the answer you're going to be looking for soon.
P.S. If you use any of the help you've gotten here, tell your teacher that you've done so, and give him the URL of this question (http://stackoverflow.com/questions/39698430/f-how-to-call-a-function-with-argument-byref-int) so he can read what you asked and what other people told you. If he's a good teacher, he won't lower your grade for doing that; in fact, he might raise it for being honest and upfront about how you solved the problem. But if you got help with your homework and you don't tell your teacher, 1) that's dishonest, and 2) you'll only hurt yourself in the long run, because he'll think you understand a concept that you maybe haven't understood yet.
UPDATE 2: s952163 suggests that I show you how to use the fold and scan functions, and I thought "Why not?" Keep in mind that these are advanced techniques, so you probably won't get assignments where you need to use fold for a while. But fold is basically a way to take any list and do a calculation that turns the list into a single value, in a generic way. With fold, you specify three things: the list you want to work with, the starting value for your calculation, and a function of two parameters that will do one step of the calculation. For example, if you're trying to add up all the numbers from 1 to n, your "one step" function would be let add a b = a + b. (There's an even more advanced feature of F# that I'm skipping in this explanation, because you should learn just one thing at a time. By skipping it, it keeps the add function simple and easy to understand.)
The way you would use fold looks like this:
let sumWithFold n =
let upToN = [1..n] // This is the list [1; 2; 3; ...; n]
let add a b = a + b
List.fold add 0 upToN
Note that I wrote List.fold. If upToN was an array, then I would have written Array.fold instead. The arguments to fold, whether it's List.fold or Array.fold, are, in order:
The function to do one step of your calculation
The initial value for your calculation
The list (if using List.fold) or array (if using Array.fold) that you want to do the calculation with.
Let me step you through what List.fold does. We'll pretend you've called your function with 4 as the value of n.
First step: the list is [1;2;3;4], and an internal valueSoFar variable inside List.fold is set to the initial value, which in our case is 0.
Next: the calculation function (in our case, add) is called with valueSoFar as the first parameter, and the first item of the list as the second parameter. So we call add 0 1 and get the result 1. The internal valueSoFar variable is updated to 1, and the rest of the list is [2;3;4]. Since that is not yet empty, List.fold will continue to run.
Next: the calculation function (add) is called with valueSoFar as the first parameter, and the first item of the remainder of the list as the second parameter. So we call add 1 2 and get the result 3. The internal valueSoFar variable is updated to 3, and the rest of the list is [3;4]. Since that is not yet empty, List.fold will continue to run.
Next: the calculation function (add) is called with valueSoFar as the first parameter, and the first item of the remainder of the list as the second parameter. So we call add 3 3 and get the result 6. The internal valueSoFar variable is updated to 6, and the rest of the list is [4] (that's a list with one item, the number 4). Since that is not yet empty, List.fold will continue to run.
Next: the calculation function (add) is called with valueSoFar as the first parameter, and the first item of the remainder of the list as the second parameter. So we call add 6 4 and get the result 10. The internal valueSoFar variable is updated to 10, and the rest of the list is [] (that's an empty list). Since the remainder of the list is now empty, List.fold will stop, and return the current value of valueSoFar as its final result.
So calling List.fold add 0 [1;2;3;4] will essentially return 0+1+2+3+4, or 10.
Now we'll talk about scan. The scan function is just like the fold function, except that instead of returning just the final value, it returns a list of the values produced at all the steps (including the initial value). (Or if you called Array.scan, it returns an array of the values produced at all the steps). In other words, if you call List.scan add 0 [1;2;3;4], it goes through the same steps as List.fold add 0 [1;2;3;4], but it builds up a result list as it does each step of the calculation, and returns [0;1;3;6;10]. (The initial value is the first item of the list, then each step of the calculation).
As I said, these are advanced functions, that your teacher won't be covering just yet. But I figured I'd whet your appetite for what F# can do. By using List.fold, you don't have to write a while loop, or a for loop, or even use recursion: all that is done for you! All you have to do is write a function that does one step of a calculation, and F# will do all the rest.
This is such a bad idea:
let mutable n = 7
let sumfunc2 (n: int byref) =
let mutable s = 0
while n >= 1 do
s <- n + (n - 1)
n <- n-1
printfn "%i" s
sumfunc2 (&n)
Totally agree with munn's comments, here's another way to implode:
let sumfunc3 (n: int) =
let mutable s = n
while s >= 1 do
let n = s + (s - 1)
s <- (s-1)
printfn "%i" n
sumfunc3 7

Inner Functions With SML NJ

I am a complete newbie to sml and am having trouble with the syntax for inner functions. What I need to do is take a list of a list of ints, average each list, and return a list of reals. This is the psuedo-ish code I have so far.
fun listAvg [] = 0
else (sum (x) div size (x))
fun sum[] = 0
| sum(head::rest)= head + sum rest;
fun size [] = 0
| size(head::rest) = 1 + size rest;
listAvg([[1,3,6,8,9], [4,2,6,5,1], [9,5,9,7], [5,4], [3,6,4,8]]);
any advice would be greatly appreciated. Thanks!
Use a let, as in
fun listAvg [] = 0
| listAvg x =
let
fun sum[] = 0
| sum(head::tail)= head + sum tail;
fun size [] = 0
| size(head::tail) = 1 + size tail;
in
(sum x) div (size x)
end
You have to pass an int list to this function e.g.
listAvg [1, 2, 3, 4];
This is no change to your code, except to rearrange the order and putting the keywords let, in, and end. If this is not homework, I would recommend using a few built-in standard library functions in the List structure that could reduce this function to two lines, including the pattern match on the empty list.
EDIT:
There are two possible meanings to "average a list of a list of ints." The first is to average each list and then take the average of the averages, and the second is to join the lists together into one long list and take the average over the entire list. The two methods are equivalent (except for rounding errors) when all the lists of ints are of the same length, but as your example shows, they don't have to be the same length.
Since this is homework, I'm not going to give you the answer directly, but consider the following, which might be helpful:
If you're using the first interpretation of "average a list of a list of ints:" there is a built-in SML function that will let you apply another function to each element of a list, and get the resulting list. This might be helpful for getting the individual averages, which you can then combine together into an overall average.
If you're using the second interpretation: there is a built-in SML function (it looks like an operator, but many of the things that look like operators in SML are just infix functions) to join two lists together, and a built-in SML function to apply a function to elements going down the list, along with an accumulator value, to generate a single value. You might be able to use those two functions to create one long list of all the numbers, which you can then average.

Generate a powerset with the help of a binary representation

I know that "a powerset is simply any number between 0 and 2^N-1 where N is number of set members and one in binary presentation denotes presence of corresponding member".
(Hynek -Pichi- Vychodil)
I would like to generate a powerset using this mapping from the binary representation to the actual set elements.
How can I do this with Erlang?
I have tried to modify this, but with no success.
UPD: My goal is to write an iterative algorithm that generates a powerset of a set without keeping a stack. I tend to think that binary representation could help me with that.
Here is the successful solution in Ruby, but I need to write it in Erlang.
UPD2: Here is the solution in pseudocode, I would like to make something similar in Erlang.
First of all, I would note that with Erlang a recursive solution does not necessarily imply it will consume extra stack. When a method is tail-recursive (i.e., the last thing it does is the recursive call), the compiler will re-write it into modifying the parameters followed by a jump to the beginning of the method. This is fairly standard for functional languages.
To generate a list of all the numbers A to B, use the library method lists:seq(A, B).
To translate a list of values (such as the list from 0 to 2^N-1) into another list of values (such as the set generated from its binary representation), use lists:map or a list comprehension.
Instead of splitting a number into its binary representation, you might want to consider turning that around and checking whether the corresponding bit is set in each M value (in 0 to 2^N-1) by generating a list of power-of-2-bitmasks. Then, you can do a binary AND to see if the bit is set.
Putting all of that together, you get a solution such as:
generate_powerset(List) ->
% Do some pre-processing of the list to help with checks later.
% This involves modifying the list to combine the element with
% the bitmask it will need later on, such as:
% [a, b, c, d, e] ==> [{1,a}, {2,b}, {4,c}, {8,d}, {16,e}]
PowersOf2 = [1 bsl (X-1) || X <- lists:seq(1, length(List))],
ListWithMasks = lists:zip(PowersOf2, List),
% Generate the list from 0 to 1^N - 1
AllMs = lists:seq(0, (1 bsl length(List)) - 1),
% For each value, generate the corresponding subset
lists:map(fun (M) -> generate_subset(M, ListWithMasks) end, AllMs).
% or, using a list comprehension:
% [generate_subset(M, ListWithMasks) || M <- AllMs].
generate_subset(M, ListWithMasks) ->
% List comprehension: choose each element where the Mask value has
% the corresponding bit set in M.
[Element || {Mask, Element} <- ListWithMasks, M band Mask =/= 0].
However, you can also achieve the same thing using tail recursion without consuming stack space. It also doesn't need to generate or keep around the list from 0 to 2^N-1.
generate_powerset(List) ->
% same preliminary steps as above...
PowersOf2 = [1 bsl (X-1) || X <- lists:seq(1, length(List))],
ListWithMasks = lists:zip(PowersOf2, List),
% call tail-recursive helper method -- it can have the same name
% as long as it has different arity.
generate_powerset(ListWithMasks, (1 bsl length(List)) - 1, []).
generate_powerset(_ListWithMasks, -1, Acc) -> Acc;
generate_powerset(ListWithMasks, M, Acc) ->
generate_powerset(ListWithMasks, M-1,
[generate_subset(M, ListWithMasks) | Acc]).
% same as above...
generate_subset(M, ListWithMasks) ->
[Element || {Mask, Element} <- ListWithMasks, M band Mask =/= 0].
Note that when generating the list of subsets, you'll want to put new elements at the head of the list. Lists are singly-linked and immutable, so if you want to put an element anywhere but the beginning, it has to update the "next" pointers, which causes the list to be copied. That's why the helper function puts the Acc list at the tail instead of doing Acc ++ [generate_subset(...)]. In this case, since we're counting down instead of up, we're already going backwards, so it ends up coming out in the same order.
So, in conclusion,
Looping in Erlang is idiomatically done via a tail recursive function or using a variation of lists:map.
In many (most?) functional languages, including Erlang, tail recursion does not consume extra stack space since it is implemented using jumps.
List construction is typically done backwards (i.e., [NewElement | ExistingList]) for efficiency reasons.
You generally don't want to find the Nth item in a list (using lists:nth) since lists are singly-linked: it would have to iterate the list over and over again. Instead, find a way to iterate the list once, such as how I pre-processed the bit masks above.

(Ordered) Set Partitions in fixed-size Blocks

Here is a function I would like to write but am unable to do so. Even if you
don't / can't give a solution I would be grateful for tips. For example,
I know that there is a correlation between the ordered represantions of the
sum of an integer and ordered set partitions but that alone does not help me in
finding the solution. So here is the description of the function I need:
The Task
Create an efficient* function
List<int[]> createOrderedPartitions(int n_1, int n_2,..., int n_k)
that returns a list of arrays of all set partions of the set
{0,...,n_1+n_2+...+n_k-1} in number of arguments blocks of size (in this
order) n_1,n_2,...,n_k (e.g. n_1=2, n_2=1, n_3=1 -> ({0,1},{3},{2}),...).
Here is a usage example:
int[] partition = createOrderedPartitions(2,1,1).get(0);
partition[0]; // -> 0
partition[1]; // -> 1
partition[2]; // -> 3
partition[3]; // -> 2
Note that the number of elements in the list is
(n_1+n_2+...+n_n choose n_1) * (n_2+n_3+...+n_n choose n_2) * ... *
(n_k choose n_k). Also, createOrderedPartitions(1,1,1) would create the
permutations of {0,1,2} and thus there would be 3! = 6 elements in the
list.
* by efficient I mean that you should not initially create a bigger list
like all partitions and then filter out results. You should do it directly.
Extra Requirements
If an argument is 0 treat it as if it was not there, e.g.
createOrderedPartitions(2,0,1,1) should yield the same result as
createOrderedPartitions(2,1,1). But at least one argument must not be 0.
Of course all arguments must be >= 0.
Remarks
The provided pseudo code is quasi Java but the language of the solution
doesn't matter. In fact, as long as the solution is fairly general and can
be reproduced in other languages it is ideal.
Actually, even better would be a return type of List<Tuple<Set>> (e.g. when
creating such a function in Python). However, then the arguments wich have
a value of 0 must not be ignored. createOrderedPartitions(2,0,2) would then
create
[({0,1},{},{2,3}),({0,2},{},{1,3}),({0,3},{},{1,2}),({1,2},{},{0,3}),...]
Background
I need this function to make my mastermind-variation bot more efficient and
most of all the code more "beautiful". Take a look at the filterCandidates
function in my source code. There are unnecessary
/ duplicate queries because I'm simply using permutations instead of
specifically ordered partitions. Also, I'm just interested in how to write
this function.
My ideas for (ugly) "solutions"
Create the powerset of {0,...,n_1+...+n_k}, filter out the subsets of size
n_1, n_2 etc. and create the cartesian product of the n subsets. However
this won't actually work because there would be duplicates, e.g.
({1,2},{1})...
First choose n_1 of x = {0,...,n_1+n_2+...+n_n-1} and put them in the
first set. Then choose n_2 of x without the n_1 chosen elements
beforehand and so on. You then get for example ({0,2},{},{1,3},{4}). Of
course, every possible combination must be created so ({0,4},{},{1,3},{2}),
too, and so on. Seems rather hard to implement but might be possible.
Research
I guess this
goes in the direction I want however I don't see how I can utilize it for my
specific scenario.
http://rosettacode.org/wiki/Combinations
You know, it often helps to phrase your thoughts in order to come up with a solution. It seems that then the subconscious just starts working on the task and notifies you when it found the solution. So here is the solution to my problem in Python:
from itertools import combinations
def partitions(*args):
def helper(s, *args):
if not args: return [[]]
res = []
for c in combinations(s, args[0]):
s0 = [x for x in s if x not in c]
for r in helper(s0, *args[1:]):
res.append([c] + r)
return res
s = range(sum(args))
return helper(s, *args)
print partitions(2, 0, 2)
The output is:
[[(0, 1), (), (2, 3)], [(0, 2), (), (1, 3)], [(0, 3), (), (1, 2)], [(1, 2), (), (0, 3)], [(1, 3), (), (0, 2)], [(2, 3), (), (0, 1)]]
It is adequate for translating the algorithm to Lua/Java. It is basically the second idea I had.
The Algorithm
As I already mentionend in the question the basic idea is as follows:
First choose n_1 elements of the set s := {0,...,n_1+n_2+...+n_n-1} and put them in the
first set of the first tuple in the resulting list (e.g. [({0,1,2},... if the chosen elements are 0,1,2). Then choose n_2 elements of the set s_0 := s without the n_1 chosen elements beforehand and so on. One such a tuple might be ({0,2},{},{1,3},{4}). Of
course, every possible combination is created so ({0,4},{},{1,3},{2}) is another such tuple and so on.
The Realization
At first the set to work with is created (s = range(sum(args))). Then this set and the arguments are passed to the recursive helper function helper.
helper does one of the following things: If all the arguments are processed return "some kind of empty value" to stop the recursion. Otherwise iterate through all the combinations of the passed set s of the length args[0] (the first argument after s in helper). In each iteration create the set s0 := s without the elements in c (the elements in c are the chosen elements from s), which is then used for the recursive call of helper.
So what happens with the arguments in helper is that they are processed one by one. helper may first start with helper([0,1,2,3], 2, 1, 1) and in the next invocation it is for example helper([2,3], 1, 1) and then helper([3], 1) and lastly helper([]). Of course another "tree-path" would be helper([0,1,2,3], 2, 1, 1), helper([1,2], 1, 1), helper([2], 1), helper([]). All these "tree-paths" are created and thus the required solution is generated.

What's a good way to structure variable nested loops?

Suppose you're working in a language with variable length arrays (e.g. with A[i] for all i in 1..A.length) and have to write a routine that takes n (n : 1..8) variable length arrays of items in a variable length array of length n, and needs to call a procedure with every possible length n array of items where the first is chosen from the first array, the second is chosen from the second array, and so forth.
If you want something concrete to visualize, imagine that your routine has to take data like:
[ [ 'top hat', 'bowler', 'derby' ], [ 'bow tie', 'cravat', 'ascot', 'bolo'] ... ['jackboots','galoshes','sneakers','slippers']]
and make the following procedure calls (in any order):
try_on ['top hat', 'bow tie', ... 'jackboots']
try_on ['top hat', 'bow tie', ... 'galoshes']
:
try_on ['derby','bolo',...'slippers']
This is sometimes called a chinese menu problem, and for fixed n can be coded quite simply (e.g. for n = 3, in pseudo code)
procedure register_combination( items : array [1..3] of vararray of An_item)
for each i1 from items[1]
for each i2 from items[2]
for each i3 from items[3]
register( [ii,i2,i3] )
But what if n can vary, giving a signature like:
procedure register_combination( items : vararray of vararray of An_item)
The code as written contained an ugly case statement, which I replaced with a much simpler solution. But I'm not sure it's the best (and it's surely not the only) way to refactor this.
How would you do it? Clever and surprising are good, but clear and maintainable are better--I'm just passing through this code and don't want to get called back. Concise, clear and clever would be ideal.
Edit: I'll post my solution later today, after others have had a chance to respond.
Teaser: I tried to sell a recursive solution, but they wouldn't go for it, so I had to stick to writing fortran in a HLL.
The answer I went with, posted below.
Either the recursive algorithm
procedure register_combination( items )
register_combination2( [], items [1:] )
procedure register_combination2( head, items)
if items == []
print head
else
for i in items[0]
register_combination2( head ++ i, items [1:] )
or the same with tail calls optimised out, using an array for the indices, and incrementing the last index until it reaches the length of the corresponding array, then carrying the increment up.
Recursion.
Or, better yet, trying to eliminate recursion using stack-like structures and while statements.
For your problem you stated (calling a function with variable arguments) it depends entirely on the programming language you're coding in; many of them allow for passing variable arguments.
Since they were opposed to recursion (don't ask) and I was opposed to messy case statements (which, as it turned out, were hiding a bug) I went with this:
procedure register_combination( items : vararray of vararray of An_item)
possible_combinations = 1
for each item_list in items
possible_combinations = possible_combinations * item_list.length
for i from 0 to possible_combinations-1
index = i
this_combination = []
for each item_list in items
item_from_this_list = index mod item_list.length
this_combination << item_list[item_from_this_list]
index = index div item_list.length
register_combination(this_combination)
Basically, I figure out how many combinations there are, assign each one a number, and then loop through the number producing the corresponding combination. Not a new trick, I suspect, but one worth knowing.
It's shorter, works for any practical combination of list lengths (if there are over 2^60 combinations, they have other problems), isn't recursive, and doesn't have the bug.