Now that fast anonymous functions are native to julia, do I still have to use the decorator, or is it automatically implemented. Also when I pass a function as an argument into another function, can I static type it? What can I do to improve the run speed.
FastAnonymous is definitely not necessary anymore. Here's how you can verify this yourself:
julia> #noinline g(f, x) = f(x) # prevent inlining so you know it's general
g (generic function with 1 method)
julia> h1(x) = g(identity, x)
h1 (generic function with 1 method)
julia> h2(x) = g(sin, x)
h2 (generic function with 1 method)
julia> #code_warntype h1(1)
Variables
#self#::Core.Compiler.Const(h1, false)
x::Int64
Body::Int64
1 ─ %1 = Main.g(Main.identity, x)::Int64
└── return %1
julia> #code_warntype h2(1)
Variables
#self#::Core.Compiler.Const(h2, false)
x::Int64
Body::Float64
1 ─ %1 = Main.g(Main.sin, x)::Float64
└── return %1
julia> h3(x) = g(z->"I'm a string", x)
h3 (generic function with 1 method)
julia> #code_warntype h3(1)
Variables
#self#::Core.Compiler.Const(h3, false)
x::Int64
#9::getfield(Main, Symbol("##9#10"))
Body::String
1 ─ (#9 = %new(Main.:(##9#10)))
│ %2 = #9::Core.Compiler.Const(getfield(Main, Symbol("##9#10"))(), false)
│ %3 = Main.g(%2, x)::Core.Compiler.Const("I'm a string", false)
└── return %3
In every case Julia knows the return type, and that requires that it "understand" what your function-argument is doing. Moreover:
julia> m = first(methods(g))
g(f, x) in Main at REPL[1]:1
julia> m.specializations
Core.TypeMapEntry(Core.TypeMapEntry(Core.TypeMapEntry(nothing, Tuple{typeof(g),typeof(identity),Int64}, nothing, svec(), 1, -1, MethodInstance for g(::typeof(identity), ::Int64), true, true, false), Tuple{typeof(g),typeof(sin),Int64}, nothing, svec(), 1, -1, MethodInstance for g(::typeof(sin), ::Int64), true, true, false), Tuple{typeof(g),getfield(Main, Symbol("##9#10")),Int64}, nothing, svec(), 1, -1, MethodInstance for g(::getfield(Main, Symbol("##9#10")), ::Int64), true, true, false)
This is a bit hard to read, but if you look carefully you'll see that g has been compiled for 3 inputs:
Tuple{typeof(identity), Int64}
Tuple{typeof(sin), Int64}
Tuple{getfield(Main, Symbol("##9#10")),Int64}
(The compiled versions also take g itself as an extra argument, for reasons having to do with things like the internal implementation of keyword-argument handling, but let's ignore that for now.) The last one is the generated name for the type implementing the anonymous function. What this shows you is that each function has its own type, which is the reason why passing functions as arguments is fast.
For the gurus, there is one other factor that can come in to play: because type inference is subject to the unsolvable halting problem, there are circumstances where inference will decide that this is all getting too complex and abort "early." In such cases (which are relatively rare), it can help to force the compiler to specialize against a particular argument. In our example, that would mean declaring g as
#noinline g(f::F, x) where F = f(x)
rather than
#noinline g(f, x) = f(x)
That ::F is normally unnecessary and appears useless, but you can use it as a compiler-hint to increase the amount of effort used to infer the result. I don't recommend doing that by default (it makes your code a bit harder to read), but if you see weird performance problems it's one thing to try.
Related
How can I declare a Julia function that returns a function with a specific signature. For example, say I want to return a function that takes an Int and returns an Int:
function buildfunc()::?????
mult(x::Int) = x * 2
return mult
end
What should the question marks be replaced with?
One thing needs to be made clear.
Adding a type declaration on the returned parameter is just an assertion, not part of function definition. To understand what is going on look at the lowered (this is a pre-compilation stage) code of a function:
julia> f(a::Int)::Int = 2a
f (generic function with 1 method)
julia> #code_lowered f(5)
CodeInfo(
1 ─ %1 = Main.Int
│ %2 = 2 * a
│ %3 = Base.convert(%1, %2)
│ %4 = Core.typeassert(%3, %1)
└── return %4
)
In this case since the returned type is obvious this assertion will be actually removed during the compilation process (try #code_native f(5) to see yourself).
If you want for some reason to generate functions I recommend to use the #generated macro. Be warned: meta-programming is usually an overkill for solving any Julia related problem.
#generated function f2(x)
if x <: Int
quote
2x
end
else
quote
10x
end
end
end
Now we have a function f2 where the source code of f2 is going to depend on the parameter type:
julia> f2(3)
6
julia> f2(3.)
30.0
Note that this function generation is actually happening during the compile time:
julia> #code_lowered f2(2)
CodeInfo(
# REPL[34]:1 within `f2'
┌ # REPL[34]:4 within `macro expansion'
1 ─│ %1 = 2 * x
└──│ return %1
└
)
Hope that clears things out.
You can use Function type for this purpose. From Julia documentation:
Function is the abstract type of all functions
function b(c::Int64)::Int64
return c+2;
end
function a()::Function
return b;
end
Which prints:
julia> println(a()(2));
4
Julia will throw exception for Float64 input.
julia> println(a()(2.0));
ERROR: MethodError: no method matching b(::Float64) Closest candidates are: b(::Int64)
I’m writing Julia code whose inputs are json files, that performs analysis in (the field of mathematical finance) and writes results as json. The code is a port from R in the hope of performance improvement.
I parse the input files using JSON.parsefile. This returns a Dict in which I observe that all vectors are of type Array{Any,1}. As it happens, I know that the input file will never contain vectors of mixed type, such as some Strings and some Numbers.
So I wrote the following code, which seems to work well and is “safe” in the sense that if the calls to convert fail then a vector continues to have type Array{Any,1}.
function typenarrow!(d::Dict)
for k in keys(d)
if d[k] isa Array{Any,1}
d[k] = typenarrow(d[k])
elseif d[k] isa Dict
typenarrow!(d[k])
end
end
end
function typenarrow(v::Array{Any,1})
for T in [String,Int64,Float64,Bool,Vector{Float64}]
try
return(convert(Vector{T},v))
catch; end
end
return(v)
end
My question is: Is this worth doing? Can I expect code that processes the contents of the Dict to execute faster if I do this type narrowing? I think the answer is yes in that the Julia performance tips recommend to “Annotate values taken from untyped locations” and this approach ensures there are no “untyped locations”.
There are two levels of the answer to this question:
Level 1
Yes, it will help the performance of the code. See for instance the following benchmark:
julia> using BenchmarkTools
julia> x = Any[1 for i in 1:10^6];
julia> y = [1 for i in 1:10^6];
julia> #btime sum($x)
26.507 ms (477759 allocations: 7.29 MiB)
1000000
julia> #btime sum($y)
226.184 μs (0 allocations: 0 bytes)
1000000
You can write your typenarrow function using a bit simpler approach like this:
typenarrow(x) = [v for v in x]
as using the comprehension will produce a vector of concrete type (assuming your source vector is homogeneous)
Level 2
This is not fully optimal. The problem that is still left is that you have a Dict that is a container with abstract type parameter (see https://docs.julialang.org/en/latest/manual/performance-tips/#Avoid-containers-with-abstract-type-parameters-1). Therefore in order for the computations to be fast you have to use a barrier function (see https://docs.julialang.org/en/latest/manual/performance-tips/#kernel-functions-1) or use type annotation for variables you introduce (see https://docs.julialang.org/en/v1/manual/types/index.html#Type-Declarations-1).
In the ideal world your Dict would have keys and values of homogeneous types and all would be maximally fast then, but if I understand your code correctly values in your case are not homogeneous.
EDIT
In order to solve the Level 2 isuue you can convert Dict into NamedTuple like this (this is a minimal example assuming that Dicts only nest in Dicts directly, but it should be easy enough to extend if you want more flexibility).
First, the function performing the conversion looks like:
function typenarrow!(d::Dict)
for k in keys(d)
if d[k] isa Array{Any,1}
d[k] = [v for v in d[k]]
elseif d[k] isa Dict
d[k] = typenarrow!(d[k])
end
end
NamedTuple{Tuple(Symbol.(keys(d)))}(values(d))
end
Now a MWE of its use:
julia> using JSON
julia> x = """
{
"name": "John",
"age": 27,
"values": {
"v1": [1,2,3],
"v2": [1.5,2.5,3.5]
},
"v3": [1,2,3]
}
""";
julia> j1 = JSON.parse(x)
Dict{String,Any} with 4 entries:
"name" => "John"
"values" => Dict{String,Any}("v2"=>Any[1.5, 2.5, 3.5],"v1"=>Any[1, 2, 3])
"age" => 27
"v3" => Any[1, 2, 3]
julia> j2 = typenarrow!(j1)
(name = "John", values = (v2 = [1.5, 2.5, 3.5], v1 = [1, 2, 3]), age = 27, v3 = [1, 2, 3])
julia> dump(j2)
NamedTuple{(:name, :values, :age, :v3),Tuple{String,NamedTuple{(:v2, :v1),Tuple{Array{Float64,1},Array{Int64,1}}},Int64,Array{Int64,1}}}
name: String "John"
values: NamedTuple{(:v2, :v1),Tuple{Array{Float64,1},Array{Int64,1}}}
v2: Array{Float64}((3,)) [1.5, 2.5, 3.5]
v1: Array{Int64}((3,)) [1, 2, 3]
age: Int64 27
v3: Array{Int64}((3,)) [1, 2, 3]
The beauty of this approach is that Julia will know all types in j2, so if you pass j2 to any function as a parameter all calculations inside this function will be fast.
The downside of this approach is that a function taking j2 has to be pre-compiled, which might be problematic if j2 structure is huge (as then the structure of resulting NamedTuple is complex) and the amount of work your function does is relatively small. But for small JSON-s (small in the sense of structure, as vectors held in them can be large - their size does not add to the complexity) this approach has proven to be efficient in several applications I have developed.
I would like to create a function that deals with missing values. However, when I tried to specify the missing type Array{Missing, 1}, it errors.
function f(x::Array{<:Number, 1})
# do something complicated
println("no missings.")
println(sum(x))
end
function f(x::Array{Missing, 1})
x = collect(skipmissing(x))
# do something complicated
println("removed missings.")
f(x)
end
f([2, 3, 5])
f([2, 3, 5, missing])
I understand that my type is not Missing but Array{Union{Missing, Int64},1}
When I specify this type, it works in the case above. However, I would like to work with all types (strings, floats etc., not only Int64).
I tried
function f(x::Array{Missing, 1})
...
end
But it errors again... Saying that
f (generic function with 1 method)
ERROR: LoadError: MethodError: no method matching f(::Array{Union{Missing, Int64},1})
Closest candidates are:
f(::Array{Any,1}) at ...
How can I say that I wand the type to be union missings with whatever?
EDIT (reformulation)
Let's have these 4 vectors and two functions dealing with strings and numbers.
x1 = [1, 2, 3]
x2 = [1, 2, 3, missing]
x3 = ["1", "2", "3"]
x4 = ["1", "2", "3", missing]
function f(x::Array{<:Number,1})
println(sum(x))
end
function f(x::Array{String,1})
println(join(x))
end
f(x) doesn't work for x2 and x3, because they are of type Array{Union{Missing, Int64},1} and Array{Union{Missing, String},1}, respectively.
It is possible to have only one function that detects whether the vector contains missings, removes them and then deals appropriately with it.
for instance:
function f(x::Array{Any, 1})
x = collect(skipmissing(x))
print("removed missings")
f(x)
end
But this doesn't work because Any indicates a mixed type (e.g., strings and nums) and does not mean string OR numbers or whatever.
EDIT 2 Partial fix
This works:
function f(x::Array)
x = collect(skipmissing(x))
print("removed missings")
f(x)
end
[But how, then, to specify the shape (number of dimensions) of the array...? (this might be an unrelated topic though)]
You can do it in the following way:
function f(x::Vector{<:Number})
# do something complicated
println("no missings.")
println(sum(x))
end
function f(x::Vector{Union{Missing,T}}) where {T<:Number}
x = collect(skipmissing(x))
# do something complicated
println("removed missings.")
f(x)
end
and now it works:
julia> f([2, 3, 5])
no missings.
10
julia> f([2, 3, 5, missing])
removed missings.
no missings.
10
EDIT:
I will try to answer the questions raised (if I miss something please add a comment).
First Vector{Union{Missing, <:Number}} is the same as Vector{Union{Missing, Number}} because of the scoping rules as tibL indicated as Vector{Union{Missing, <:Number}} translates to Array{Union{Missing, T} where T<:Number,1} and where clause is inside Array.
Second (here I am not sure if this is what you want). I understand you want the following behavior:
julia> g(x::Array{>:Missing,1}) = "$(eltype(x)) allows missing"
g (generic function with 2 methods)
julia> g(x::Array{T,1}) where T = "$(eltype(x)) does not allow missing"
g (generic function with 2 methods)
julia> g([1,2,3])
"Int64 does not allow missing"
julia> g([1,2,missing])
"Union{Missing, Int64} allows missing"
julia> g(["a",'a'])
"Any allows missing"
julia> g(Union{String,Char}["a",'a'])
"Union{Char, String} does not allow missing"
Note the last two line - although ["a", 'a'] does not contain missing the array has Any element type so it might contain missing. The last case excludes it.
Also you can see that you could change the second parameter of Array{T,N} to something else to get a different dimensionality.
Also this example works because the first method, as more specific, catches all cases that allow Missing and a second method, as more general, catches what is left (i.e. essentially what does not allow Missing).
I'm having an issue with type in functions, I've managed to write the minimal code that explains the problem:
immutable Inner{B<:Real, C<:Real}
a::B
c::C
end
immutable Outer{T}
a::T
end
function g(a::Outer{Inner})
println("Naaa")
end
inner = Inner(1, 1)
outer = Outer(inner)
g(outer)
Will lead to the method error
MethodError: no method matching g(::Outer{Inner{Int64,Int64}})
So basically, I don't want to have to say what the types of Inner are, I just want the function to make sure that it's an Outer{Inner} and not Outer{Float64} or something.
Any help would be appreciated
The type Inner{Int64,Int64} is a concrete Inner type and it is not a subtype of
Inner{Real, Real}, since different concrete types of Inner (Int64 or Float64)
can have different representations in memory.
According to the documentation, function g should be defined as:
function g(a::Outer{<:Inner})
println("Naaa")
end
so it can accept all arguments of type Inner.
Some examples, after define g with <::
# -- With Float32 --
julia> innerf32 = Inner(1.0f0, 1.0f0)
Inner{Float32,Float32}(1.0f0, 1.0f0)
julia> outerf32 = Outer(innerf32)
Outer{Inner{Float32,Float32}}(Inner{Float32,Float32}(1.0f0, 1.0f0))
julia> g(outerf32)
Naaa
# -- With Float64 --
julia> innerf64 = Inner(1.0, 1.0)
Inner{Float64,Float64}(1.0, 1.0)
julia> outerf64 = Outer(innerf64)
Outer{Inner{Float64,Float64}}(Inner{Float64,Float64}(1.0, 1.0))
julia> g(outerf64)
Naaa
# -- With Int64 --
julia> inneri64 = Inner(1, 1)
Inner{Int64,Int64}(1, 1)
julia> outeri64 = Outer(inneri64)
Outer{Inner{Int64,Int64}}(Inner{Int64,Int64}(1, 1))
julia> g(outeri64)
Naaa
More details at the documentation: Parametric Composite Type
Update: The way to declare an immutable composite type (as in the original question), have changed to:
struct Inner{B<:Real, C<:Real}
a::B
c::C
end
struct Outer{T}
a::T
end
Furthermore, function g could be declared with a parametric type:
function g(a::T) where T Outer{<:Inner}
println(a)
println(a.a)
println(a.c)
end
And hence, there is no need to create an instance of Outer before calling the function.
julia> ft64 = Inner(1.1, 2.2)
Inner{Float64,Float64}(1.1, 2.2)
julia> g(ft64)
Inner{Float64,Float64}(1.1, 2.2)
1.1
2.2
julia> i64 = Inner(3, 4)
Inner{Int64,Int64}(3, 4)
julia> g(i64)
Inner{Int64,Int64}(3, 4)
3
4
Here is a simple function in Julia 0.5.
function foo{T<:AbstractFloat}(x::T)
a = zero(T)
b = zero(T)
return x
end
I started with julia --track-allocation=user. then include("test.jl"). test.jl only has this function. Run foo(5.). Then Profile.clear_malloc_data(). foo(5.) again in the REPL. Quit julia. Look at the file test.jl.mem.
- function foo{T<:AbstractFloat}(x::T)
- a = zero(T)
194973 b = zero(T)
0 return x
- end
-
Why is there 194973 bytes of memory allocated here? This is also not the first line of the function. Although after Profile.clear_malloc_data(), this shouldn't matter.
Let's clarify some parts of the relevant documentation, which can be a little misleading:
In interpreting the results, there are a few important details. Under the user setting, the first line of any function directly called from the REPL will exhibit allocation due to events that happen in the REPL code itself.
Indeed, the line with allocation is not the first line. However, it is still the first tracked line, since Julia 0.5 has some issues with tracking allocation on the actual first statement (this has been fixed on v0.6). Note that it may also (contrary to what the documentation says) propagate into functions, even if they are annotated with #noinline. The only real solution is to ensure the first statement of what's being called is something you don't want to measure.
More significantly, JIT-compilation also adds to allocation counts, because much of Julia’s compiler is written in Julia (and compilation usually requires memory allocation). The recommended procedure is to force compilation by executing all the commands you want to analyze, then call Profile.clear_malloc_data() to reset all allocation counters. Finally, execute the desired commands and quit Julia to trigger the generation of the .mem files.
You're right that Profile.clear_malloc_data() prevents the allocation for JIT compilation being counted. However, this paragraph is separate from the first paragraph; clear_malloc_data does not do anything about allocation due to "events that happen in the REPL code itself".
Indeed, as I'm sure you suspected, there is no allocation in this function:
julia> function foo{T<:AbstractFloat}(x::T)
a = zero(T)
b = zero(T)
return x
end
foo (generic function with 1 method)
julia> #allocated foo(5.)
0
The numbers you see are due to events in the REPL itself. To avoid this issue, wrap the code to measure in a function. That is to say, we can use this as our test harness, perhaps after disabling inlining on foo with #noinline. For instance, here's a revised test.jl:
#noinline function foo{T<:AbstractFloat}(x::T)
a = zero(T)
b = zero(T)
return x
end
function do_measurements()
x = 0. # dummy statement
x += foo(5.)
x # prevent foo call being optimized out
# (it won't, but this is good practice)
end
Then a REPL session:
julia> include("test.jl")
do_measurements (generic function with 1 method)
julia> do_measurements()
5.0
julia> Profile.clear_malloc_data()
julia> do_measurements()
5.0
Which produces the expected result:
- #noinline function foo{T<:AbstractFloat}(x::T)
0 a = zero(T)
0 b = zero(T)
0 return x
- end
-
- function do_measurements()
155351 x = 0. # dummy statement
0 x += foo(5.)
0 x # prevent foo call being optimized out
- # (it won't, but this is good practice)
- end
-