Julia: creating a method for Any vector with missing values - function

I would like to create a function that deals with missing values. However, when I tried to specify the missing type Array{Missing, 1}, it errors.
function f(x::Array{<:Number, 1})
# do something complicated
println("no missings.")
println(sum(x))
end
function f(x::Array{Missing, 1})
x = collect(skipmissing(x))
# do something complicated
println("removed missings.")
f(x)
end
f([2, 3, 5])
f([2, 3, 5, missing])
I understand that my type is not Missing but Array{Union{Missing, Int64},1}
When I specify this type, it works in the case above. However, I would like to work with all types (strings, floats etc., not only Int64).
I tried
function f(x::Array{Missing, 1})
...
end
But it errors again... Saying that
f (generic function with 1 method)
ERROR: LoadError: MethodError: no method matching f(::Array{Union{Missing, Int64},1})
Closest candidates are:
f(::Array{Any,1}) at ...
How can I say that I wand the type to be union missings with whatever?
EDIT (reformulation)
Let's have these 4 vectors and two functions dealing with strings and numbers.
x1 = [1, 2, 3]
x2 = [1, 2, 3, missing]
x3 = ["1", "2", "3"]
x4 = ["1", "2", "3", missing]
function f(x::Array{<:Number,1})
println(sum(x))
end
function f(x::Array{String,1})
println(join(x))
end
f(x) doesn't work for x2 and x3, because they are of type Array{Union{Missing, Int64},1} and Array{Union{Missing, String},1}, respectively.
It is possible to have only one function that detects whether the vector contains missings, removes them and then deals appropriately with it.
for instance:
function f(x::Array{Any, 1})
x = collect(skipmissing(x))
print("removed missings")
f(x)
end
But this doesn't work because Any indicates a mixed type (e.g., strings and nums) and does not mean string OR numbers or whatever.
EDIT 2 Partial fix
This works:
function f(x::Array)
x = collect(skipmissing(x))
print("removed missings")
f(x)
end
[But how, then, to specify the shape (number of dimensions) of the array...? (this might be an unrelated topic though)]

You can do it in the following way:
function f(x::Vector{<:Number})
# do something complicated
println("no missings.")
println(sum(x))
end
function f(x::Vector{Union{Missing,T}}) where {T<:Number}
x = collect(skipmissing(x))
# do something complicated
println("removed missings.")
f(x)
end
and now it works:
julia> f([2, 3, 5])
no missings.
10
julia> f([2, 3, 5, missing])
removed missings.
no missings.
10
EDIT:
I will try to answer the questions raised (if I miss something please add a comment).
First Vector{Union{Missing, <:Number}} is the same as Vector{Union{Missing, Number}} because of the scoping rules as tibL indicated as Vector{Union{Missing, <:Number}} translates to Array{Union{Missing, T} where T<:Number,1} and where clause is inside Array.
Second (here I am not sure if this is what you want). I understand you want the following behavior:
julia> g(x::Array{>:Missing,1}) = "$(eltype(x)) allows missing"
g (generic function with 2 methods)
julia> g(x::Array{T,1}) where T = "$(eltype(x)) does not allow missing"
g (generic function with 2 methods)
julia> g([1,2,3])
"Int64 does not allow missing"
julia> g([1,2,missing])
"Union{Missing, Int64} allows missing"
julia> g(["a",'a'])
"Any allows missing"
julia> g(Union{String,Char}["a",'a'])
"Union{Char, String} does not allow missing"
Note the last two line - although ["a", 'a'] does not contain missing the array has Any element type so it might contain missing. The last case excludes it.
Also you can see that you could change the second parameter of Array{T,N} to something else to get a different dimensionality.
Also this example works because the first method, as more specific, catches all cases that allow Missing and a second method, as more general, catches what is left (i.e. essentially what does not allow Missing).

Related

Are fast Anonymous functions automatic now?

Now that fast anonymous functions are native to julia, do I still have to use the decorator, or is it automatically implemented. Also when I pass a function as an argument into another function, can I static type it? What can I do to improve the run speed.
FastAnonymous is definitely not necessary anymore. Here's how you can verify this yourself:
julia> #noinline g(f, x) = f(x) # prevent inlining so you know it's general
g (generic function with 1 method)
julia> h1(x) = g(identity, x)
h1 (generic function with 1 method)
julia> h2(x) = g(sin, x)
h2 (generic function with 1 method)
julia> #code_warntype h1(1)
Variables
#self#::Core.Compiler.Const(h1, false)
x::Int64
Body::Int64
1 ─ %1 = Main.g(Main.identity, x)::Int64
└── return %1
julia> #code_warntype h2(1)
Variables
#self#::Core.Compiler.Const(h2, false)
x::Int64
Body::Float64
1 ─ %1 = Main.g(Main.sin, x)::Float64
└── return %1
julia> h3(x) = g(z->"I'm a string", x)
h3 (generic function with 1 method)
julia> #code_warntype h3(1)
Variables
#self#::Core.Compiler.Const(h3, false)
x::Int64
#9::getfield(Main, Symbol("##9#10"))
Body::String
1 ─ (#9 = %new(Main.:(##9#10)))
│ %2 = #9::Core.Compiler.Const(getfield(Main, Symbol("##9#10"))(), false)
│ %3 = Main.g(%2, x)::Core.Compiler.Const("I'm a string", false)
└── return %3
In every case Julia knows the return type, and that requires that it "understand" what your function-argument is doing. Moreover:
julia> m = first(methods(g))
g(f, x) in Main at REPL[1]:1
julia> m.specializations
Core.TypeMapEntry(Core.TypeMapEntry(Core.TypeMapEntry(nothing, Tuple{typeof(g),typeof(identity),Int64}, nothing, svec(), 1, -1, MethodInstance for g(::typeof(identity), ::Int64), true, true, false), Tuple{typeof(g),typeof(sin),Int64}, nothing, svec(), 1, -1, MethodInstance for g(::typeof(sin), ::Int64), true, true, false), Tuple{typeof(g),getfield(Main, Symbol("##9#10")),Int64}, nothing, svec(), 1, -1, MethodInstance for g(::getfield(Main, Symbol("##9#10")), ::Int64), true, true, false)
This is a bit hard to read, but if you look carefully you'll see that g has been compiled for 3 inputs:
Tuple{typeof(identity), Int64}
Tuple{typeof(sin), Int64}
Tuple{getfield(Main, Symbol("##9#10")),Int64}
(The compiled versions also take g itself as an extra argument, for reasons having to do with things like the internal implementation of keyword-argument handling, but let's ignore that for now.) The last one is the generated name for the type implementing the anonymous function. What this shows you is that each function has its own type, which is the reason why passing functions as arguments is fast.
For the gurus, there is one other factor that can come in to play: because type inference is subject to the unsolvable halting problem, there are circumstances where inference will decide that this is all getting too complex and abort "early." In such cases (which are relatively rare), it can help to force the compiler to specialize against a particular argument. In our example, that would mean declaring g as
#noinline g(f::F, x) where F = f(x)
rather than
#noinline g(f, x) = f(x)
That ::F is normally unnecessary and appears useless, but you can use it as a compiler-hint to increase the amount of effort used to infer the result. I don't recommend doing that by default (it makes your code a bit harder to read), but if you see weird performance problems it's one thing to try.

Comma separated binary arguments? - elixir

I've been learning elixir this month, and was in a situation where I wanted to convert a binary object into a list of bits, for pattern matching.
My research led me here, to an article showing a method for doing so. However, I don't fully understand one of the arguments passed to the extract function.
I could just copy and paste the code, but I'd like to understand what's going on under the hood here.
The argument is this: <<b :: size(1), bits :: bitstring>>.
What I understand
I understand that << x >> denotes a binary object x. Logically to me, it looks as though this is similar to performing: [head | tail] = list on a List, to get the first element, and then the remaining ones as a new list called tail.
What I don't understand
However, I'm not familiar with the syntax, and I have never seen :: in elixir, nor have I ever seen a binary object separated by a comma: ,. I also, haven't seen size(x) used in Elixir, and have never encountered a bitstring.
The Bottom Line
If someone, could explain exactly how the syntax for this argument breaks down, or point me towards a resource I would highly appreciate it.
For your convenience, the code from that article:
defmodule Bits do
# this is the public api which allows you to pass any binary representation
def extract(str) when is_binary(str) do
extract(str, [])
end
# this function does the heavy lifting by matching the input binary to
# a single bit and sends the rest of the bits recursively back to itself
defp extract(<<b :: size(1), bits :: bitstring>>, acc) when is_bitstring(bits) do
extract(bits, [b | acc])
end
# this is the terminal condition when we don't have anything more to extract
defp extract(<<>>, acc), do: acc |> Enum.reverse
end
IO.inspect Bits.extract("!!") # => [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]
IO.inspect Bits.extract(<< 99 >>) #=> [0, 1, 1, 0, 0, 0, 1, 1]
Elixir pattern matching seems mind blowingly easy to use for
structured binary data.
Yep. You can thank the erlang inventors.
According to the documentation, <<x :: size(y)>> denotes a bitstring,
whos decimal value is x and is represented by a string of bits that is
y in length.
Let's dumb it down a bit: <<x :: size(y)>> is the integer x inserted into y bits. Examples:
<<1 :: size(1)>> => 1
<<1 :: size(2)>> => 01
<<1 :: size(3)>> => 001
<<2 :: size(3)>> => 010
<<2 :: size(4)>> => 0010
The number of bits in the binary type is divisible by 8, so a binary type has a whole number of bytes (1 byte = 8 bits). The number of bits in a bitstring is not divisible by 8. That's the difference between the binary type and the bitstring type.
I understand that << x >> denotes a binary object x. Logically to me,
it looks as though this is similar to performing: [head | tail] = list
on a List, to get the first element, and then the remaining ones as a
new list called tail.
Yes:
defmodule A do
def show_list([]), do: :ok
def show_list([head|tail]) do
IO.puts head
show_list(tail)
end
def show_binary(<<>>), do: :ok
def show_binary(<<char::binary-size(1), rest::binary>>) do
IO.puts char
show_binary(rest)
end
end
In iex:
iex(6)> A.show_list(["a", "b", "c"])
a
b
c
:ok
iex(7)> "abc" = <<"abc">> = <<"a", "b", "c">> = <<97, 98, 99>>
"abc"
iex(9)> A.show_binary(<<97, 98, 99>>)
a
b
c
:ok
Or you can interpret the integers in the binary as plain old integers:
def show(<<>>), do: :ok
def show(<<ascii_code::integer-size(8), rest::binary>>) do
IO.puts ascii_code
show(rest)
end
In iex:
iex(6)> A.show(<<97, 98, 99>>)
97
98
99
:ok
The utf8 type is super useful because it will grab as many bytes as required to get a whole utf8 character:
def show(<<>>), do: :ok
def show(<<char::utf8, rest::binary>>) do
IO.puts char
show(rest)
end
In iex:
iex(8)> A.show("ۑ")
8364
235
:ok
As you can see, the uft8 type returns the unicode codepoint of the character. To get the character as a string/binary:
def show(<<>>), do: :ok
def show(<<codepoint::utf8, rest::binary>>) do
IO.puts <<codepoint::utf8>>
show(rest)
end
You take the codepoint(an integer) and use it to create the binary/string <<codepoint::utf8>>.
In iex:
iex(1)> A.show("ۑ")
€
ë
:ok
You can't specify a size for the utf8 type, though, so if you want to read multiple utf8 characters, you have to specify multiple segments.
And of course, the segment rest::binary, i.e. a binary type with no size specified, is super useful. It can only appear at the end of a pattern, and rest::binary is like the greedy regex: (.*). The same goes for rest::bitstring.
Although the elixir docs don't mention it anywhere, the total number of bits in a segment, where a segment is one of those things:
| | |
v v v
<< 1::size(8), 1::size(16), 1::size(1) >>
is actually unit * size, where each type has a default unit. The default type for a segment is integer, so the type for each segment above defaults to integer. An integer has a default unit of 1 bit, so the total number of bits in the first segment is: 8 * 1 bit = 8 bits. The default unit for the binary type is 8 bits, so a segment like:
<< char::binary-size(6)>>
has a total size of 6 * 8 bits = 48 bits. Equivalently, size(6) is just the number of bytes. You can specify the unit just like you can the size, e.g. <<1::integer-size(2)-unit(3)>>. The total bit size of that segment is: 2 * 3 bits = 6 bits.
However, I'm not familiar with the syntax
Check this out:
def bitstr2bits(bitstr) do
for <<bit::integer-size(1) <- bitstr>>, do: bit
end
In iex:
iex(17)> A.bitstr2bits <<1::integer-size(2), 2::integer-size(2)>>
[0, 1, 1, 0]
Equivalently:
iex(3)> A.bitstr2bits(<<0b01::integer-size(2), 0b10::integer-size(2)>>)
[0, 1, 1, 0]
Elixir tends to abstract away recursion with library functions, so usually you don't have to come up with your own recursive definitions like at your link. However, that link shows one of the standard, basic recursion tricks: adding an accumulator to the function call to gather results that you want the function to return. That function could also be written like this:
def bitstr2bits(<<>>), do: []
def bitstr2bits(<<bit::integer-size(1), rest::bitstring>>) do
[bit | bitstr2bits(rest)]
end
The accumulator function at the link is tail recursive, which means it takes up a constant (small) amount of memory--no matter how many recursive function calls are needed to step through the bitstring. A bitstring with 10 million bits? Requiring 10 million recursive function calls? That would only require a small amount of memory. In the old days, the alternate definition I posted could potentially crash your program because it would take up more and more memory for each recursive function call, and if the bitstring were long enough the amount of memory needed would be too large, and you would get stackoverflow and your program would crash. However, erlang has optimized away the disadvantages of recursive functions that are not tail recursive.
You can read about all these here, short answer:
:: is similar as guard, like a when is_integer(a), in you case size(1) expect a 1 bit binary
, is a separator between matching binaries, like | in [x | []] or like comma in [a, b]
bitstring is a superset over binaries, you can read about it here and here, any binary can be respresented as bitstring
iex> ?h
104
iex> ?e
101
iex> ?l
108
iex> ?o
111
iex> <<104, 101, 108, 108, 111>>
"hello"
iex> [104, 101, 108, 108, 111]
'hello'
but not vice versa
iex> <<1, 2, 3>>
<<1, 2, 3>>
After some research, I realized I overlooked some important information located at: elixir-lang.
According to the documentation, <<x :: size(y)>> denotes a bitstring, whos decimal value is x and is represented by a string of bits that is y in length.
Furthermore, <<binary>> will always attempt to conglomerate values in a left-first direction, into bytes or 8-bits, however, if the number of bits is not divisible by 8, there will by x bytes, followed by a bitstring.
For example:
iex> <<3::size(5), 5::size(6)>> # <<00011, 000101>>
<<24, 5::size(3)>> # automatically shifted to: <<00011000(24) , 101>>
Now, elixir also lets us pattern match binaries, and bitstrings like so:
iex> <<3 :: size(2), b :: bitstring>> = <<61 :: size(6)>> # [11] [1101]
iex> b
<<13 :: size(4)>> # [1101]
So, i completly misunderstood binaries and biststrings, and pattern matching between the two.
Not really the answer to the question stated, but I’d put it here for the sake of formatting. In elixir we usually use Kernel.SpecialForms.for/1 comprehension for bitstring generation.
for << b :: size(1) <- "!!" >>, do: b
#⇒ [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]
for << b :: size(1) <- <<99>> >>, do: b
#⇒ [0, 1, 1, 0, 0, 0, 1, 1]
I wanted to use the bits, in an 8 bit binary to toggle conditions. So
[b1, b2, ...] = extract(<<binary>>)
I then wanted to say:
if b1, do: x....
if b2, do: y...
Is there a better way to do what I'm trying to do, instead of pattern
matching?
First of all, the only terms that evaluate to false in elixir are false and nil (just like in ruby):
iex(18)> x = 1
1
iex(19)> y = 0
0
iex(20)> if x, do: IO.puts "I'm true."
I'm true.
:ok
iex(21)> if y, do: IO.puts "I'm true."
I'm true.
:ok
Although, the fix is easy:
if b1 == 1, do: ...
Extracting the bits into a list is unnecessary because you can just iterate the bitstring:
def check_bits(<<>>), do: :ok
def check_bits(<<bit::integer-size(1), rest::bitstring>>) do
if bit == 1, do: IO.puts "bit is on"
check_bits(rest)
end
In other words, you can treat a bitstring similarly to a list.
Or, instead of performing the logic in the body of the function to determine whether the bit is 1, you can use pattern matching in the head of the function:
def check_bits(<<>>), do: :ok
def check_bits(<< 1::integer-size(1), rest::bitstring >>) do
IO.puts "The bit is 1."
check_bits(rest)
end
def check_bits(<< 0::integer-size(1), rest::bitstring >>) do
IO.puts "The bit is 0."
check_bits(rest)
end
Instead of using a variable, bit, for the match like here:
bit::integer-size(1)
...you use a literal value, 1:
1::integer-size(1)
The only thing that can match a literal value is the literal value itself. As a result, the pattern:
<< 1::integer-size(1), rest::bitstring >>
will only match a bitstring where the first bit, integer-size(1), is 1. The literal matching employed there is similar to doing the following with a list:
def list_contains_4([4|_tail]) do
IO.puts "found a 4"
true #end the recursion and return true
end
def list_contains_4([head|tail]) do
IO.puts "#{head} is not a 4"
list_contains_4(tail)
end
def list_contains_4([]), do: false
The first function clause tries to match the literal 4 at the head of the list. If the head of the list is not 4, there's no match; so elixir moves on to the next function clause, and in the next function clause the variable head will match anything in the list.
Using pattern matching in the head of a function rather than performing logic in the body of a function is considered more stylish and efficient in erlang.

In Julia is it worth type-narrowing a dictionary returned by `JSON.parsefile`

I’m writing Julia code whose inputs are json files, that performs analysis in (the field of mathematical finance) and writes results as json. The code is a port from R in the hope of performance improvement.
I parse the input files using JSON.parsefile. This returns a Dict in which I observe that all vectors are of type Array{Any,1}. As it happens, I know that the input file will never contain vectors of mixed type, such as some Strings and some Numbers.
So I wrote the following code, which seems to work well and is “safe” in the sense that if the calls to convert fail then a vector continues to have type Array{Any,1}.
function typenarrow!(d::Dict)
for k in keys(d)
if d[k] isa Array{Any,1}
d[k] = typenarrow(d[k])
elseif d[k] isa Dict
typenarrow!(d[k])
end
end
end
function typenarrow(v::Array{Any,1})
for T in [String,Int64,Float64,Bool,Vector{Float64}]
try
return(convert(Vector{T},v))
catch; end
end
return(v)
end
My question is: Is this worth doing? Can I expect code that processes the contents of the Dict to execute faster if I do this type narrowing? I think the answer is yes in that the Julia performance tips recommend to “Annotate values taken from untyped locations” and this approach ensures there are no “untyped locations”.
There are two levels of the answer to this question:
Level 1
Yes, it will help the performance of the code. See for instance the following benchmark:
julia> using BenchmarkTools
julia> x = Any[1 for i in 1:10^6];
julia> y = [1 for i in 1:10^6];
julia> #btime sum($x)
26.507 ms (477759 allocations: 7.29 MiB)
1000000
julia> #btime sum($y)
226.184 μs (0 allocations: 0 bytes)
1000000
You can write your typenarrow function using a bit simpler approach like this:
typenarrow(x) = [v for v in x]
as using the comprehension will produce a vector of concrete type (assuming your source vector is homogeneous)
Level 2
This is not fully optimal. The problem that is still left is that you have a Dict that is a container with abstract type parameter (see https://docs.julialang.org/en/latest/manual/performance-tips/#Avoid-containers-with-abstract-type-parameters-1). Therefore in order for the computations to be fast you have to use a barrier function (see https://docs.julialang.org/en/latest/manual/performance-tips/#kernel-functions-1) or use type annotation for variables you introduce (see https://docs.julialang.org/en/v1/manual/types/index.html#Type-Declarations-1).
In the ideal world your Dict would have keys and values of homogeneous types and all would be maximally fast then, but if I understand your code correctly values in your case are not homogeneous.
EDIT
In order to solve the Level 2 isuue you can convert Dict into NamedTuple like this (this is a minimal example assuming that Dicts only nest in Dicts directly, but it should be easy enough to extend if you want more flexibility).
First, the function performing the conversion looks like:
function typenarrow!(d::Dict)
for k in keys(d)
if d[k] isa Array{Any,1}
d[k] = [v for v in d[k]]
elseif d[k] isa Dict
d[k] = typenarrow!(d[k])
end
end
NamedTuple{Tuple(Symbol.(keys(d)))}(values(d))
end
Now a MWE of its use:
julia> using JSON
julia> x = """
{
"name": "John",
"age": 27,
"values": {
"v1": [1,2,3],
"v2": [1.5,2.5,3.5]
},
"v3": [1,2,3]
}
""";
julia> j1 = JSON.parse(x)
Dict{String,Any} with 4 entries:
"name" => "John"
"values" => Dict{String,Any}("v2"=>Any[1.5, 2.5, 3.5],"v1"=>Any[1, 2, 3])
"age" => 27
"v3" => Any[1, 2, 3]
julia> j2 = typenarrow!(j1)
(name = "John", values = (v2 = [1.5, 2.5, 3.5], v1 = [1, 2, 3]), age = 27, v3 = [1, 2, 3])
julia> dump(j2)
NamedTuple{(:name, :values, :age, :v3),Tuple{String,NamedTuple{(:v2, :v1),Tuple{Array{Float64,1},Array{Int64,1}}},Int64,Array{Int64,1}}}
name: String "John"
values: NamedTuple{(:v2, :v1),Tuple{Array{Float64,1},Array{Int64,1}}}
v2: Array{Float64}((3,)) [1.5, 2.5, 3.5]
v1: Array{Int64}((3,)) [1, 2, 3]
age: Int64 27
v3: Array{Int64}((3,)) [1, 2, 3]
The beauty of this approach is that Julia will know all types in j2, so if you pass j2 to any function as a parameter all calculations inside this function will be fast.
The downside of this approach is that a function taking j2 has to be pre-compiled, which might be problematic if j2 structure is huge (as then the structure of resulting NamedTuple is complex) and the amount of work your function does is relatively small. But for small JSON-s (small in the sense of structure, as vectors held in them can be large - their size does not add to the complexity) this approach has proven to be efficient in several applications I have developed.

julia lang - how to apply multiple functions to a value

I would like to apply a set of functions to a value and get a set of values as output. I see in help?> groupby (DataFrames package) we can do:
> df |> groupby(:a) |> [sum, length]
> df |> groupby([:a, :b]) |> [sum, length]
but can we do
> [sum, length](groupby([:a, :b]))
MethodError: objects of type Array{Function,1} are not callable
square brackets [] for indexing an Array.
eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:64
in macro expansion at ./REPL.jl:95 [inlined]
in (::Base.REPL.##3#4{Base.REPL.REPLBackend})() at ./event.jl:68
or even
> [sum, length](1:5)
I would expect the output:
[15, 5]
Yes and no. (i.e. yes it's possible, but no, not with that syntax):
No: The syntax you see with |> and dataframes is not general syntax. It's just how the |> method is defined for dataframes. See its definition in file grouping.jl (line 377) and you'll see it's just a wrapper to another function, and it's defined to either accept a function, or a vector of functions.
PS: Note that the generic |> which "pipes" an argument into a function, only expects 1-argument functions on the right hand side, and has very little to do with this particular "dataframe-overloaded" method.
Yes:
You can apply a set of functions to a set of inputs in other ways.
One simple way, e.g. would be via a list comprehension:
julia> a = [1 2 3;2 3 4];
julia> [f(a) for f in [sum, length, size]]
3-element Array{Any,1}:
15
6
(2,3)
Or using map:
julia> map( (x) -> x(a), [sum, length, size])
etc.
PS: If you're keen to use |> to achieve this, clearly you could also do something like this:
julia> a |> (x) -> [sum(x), length(x), size(x)]
but presumably that defeats the purpose of what you're trying to do :)
Your proposed syntax is possible in Julia by adding a method to the type Array{T} (here, T is restricted to subtypes of Function):
julia> (a::Array{T}){T<:Function}(x) = [f(x) for f in a]
julia> [sin cos; exp sqrt](0)
2×2 Array{Float64,2}:
0.0 1.0
1.0 0.0
However, this has a large overhead if the number of functions is small. For maximum speed, one can use Tuples and a #generated function to unroll the loop manually:
julia> #generated (t::NTuple{N, Function}){N}(x) = :($((:(t[$i](x)) for i in 1:N)...),)
julia> (cos, sin)(0)
(1.0,0.0)

Wrong arity of simple function in clojure

I've started to learn clojure. In my book there is following exercise:
Write a function, mapset, that works like map except the return value is a set:
(mapset inc [1 1 2 2])
; => #{2 3}
I've started with something like this:
(defn mapset
[vect]
(set vect))
The result is error
"Wrong number of args (2) passed to: core/mapset"
I tried [& args] as well.
So, the question is: how can I solve such problem?
Take a closer look at your call to mapset:
(mapset inc [1 1 2 2])
Since code is data, this "call" is just a list of three elements:
The symbol mapset
The symbol inc
The vector [1 1 2 2]
When you evaluate this code, Clojure will see that it is a list and proceed to evaluate each of the items in that list (once it determines that it isn't a special form or macro), so it will then have a new list of three elements:
The function to which the symbol core/mapset was bound
The function to which the symbol clojure.core/inc was bound
The vector [1 1 2 2]
Finally, Clojure will call the first element of the list with the rest of the elements as arguments. In this case, there are two arguments in the rest of the list, but in your function definition, you only accounted for one:
(defn mapset
[vect]
(set vect))
To remedy this, you could implement mapset as follows:
(defn mapset
[f vect]
(set (map f vect)))
Now, when you call (mapset inc [1 1 2 2]), the argument f will be found to the function clojure.core/inc, and the argument vect will be bound to the vector [1 1 2 2].
Your definition of mapset takes a single argument vect
At a minimum you need to take 2 arguments, a function and a sequence
(defn mapset [f xs] (set (map f xs)))`
But it is interesting to think about this as the composition of 2 functions also:
(def mapset (comp set map))