Related
In GetX, a Widget can be easily wrapped with in the widget tree like this:
Obx (() => Container(//observable stuff in here //)),
However I have an object of a custom type that also returns a widget but cannot be wrapped in Obx. I cannot declare it as a Widget because constructor fields can't be accessed when using Widget as a generic type. I need to access object fields for logic purposes.
For example:
Widget test = MyCustomWidget ({myBool: false, otherStuff: 'bla'});
test.myBool // DOESN'T WORK, CAN'T ACCESS
BUT
MyCustomWidget test = MyCustomWidget ({myBool: false, otherStuff: 'bla'});
test.myBool // DOES WORK -< This is what I need access to.
However
Obx(() => test)
// DOESNT WORK : A value of Obx cannot be assigned to type MyCustomWidget
// if cast: type 'Obx' is not a subtype of type 'MyCustomWidget'
My Questions:
Is there a way to wrap widgets declared as their types with Obx
If not, is there a way to access the constructor fields of an object declared as Widget
As stated above, I need access to a custom widget's fields for logic to avoid huge boilerplate and dirty code.
Cheers
I know that an inline function will maybe improve performance & cause the generated code to grow, but I'm not sure when it is correct to use one.
lock(l) { foo() }
Instead of creating a function object for the parameter and generating a call, the compiler could emit the following code. (Source)
l.lock()
try {
foo()
}
finally {
l.unlock()
}
but I found that there is no function object created by kotlin for a non-inline function. why?
/**non-inline function**/
fun lock(lock: Lock, block: () -> Unit) {
lock.lock();
try {
block();
} finally {
lock.unlock();
}
}
Let's say you create a higher order function that takes a lambda of type () -> Unit (no parameters, no return value), and executes it like so:
fun nonInlined(block: () -> Unit) {
println("before")
block()
println("after")
}
In Java parlance, this will translate to something like this (simplified!):
public void nonInlined(Function block) {
System.out.println("before");
block.invoke();
System.out.println("after");
}
And when you call it from Kotlin...
nonInlined {
println("do something here")
}
Under the hood, an instance of Function will be created here, that wraps the code inside the lambda (again, this is simplified):
nonInlined(new Function() {
#Override
public void invoke() {
System.out.println("do something here");
}
});
So basically, calling this function and passing a lambda to it will always create an instance of a Function object.
On the other hand, if you use the inline keyword:
inline fun inlined(block: () -> Unit) {
println("before")
block()
println("after")
}
When you call it like this:
inlined {
println("do something here")
}
No Function instance will be created, instead, the code around the invocation of block inside the inlined function will be copied to the call site, so you'll get something like this in the bytecode:
System.out.println("before");
System.out.println("do something here");
System.out.println("after");
In this case, no new instances are created.
Let me add: When not to use inline:
If you have a simple function that doesn't accept other functions as an argument, it does not make sense to inline them. IntelliJ will warn you:
Expected performance impact of inlining '...' is insignificant.
Inlining works best for functions with parameters of functional types
Even if you have a function "with parameters of functional types", you may encounter the compiler telling you that inlining does not work. Consider this example:
inline fun calculateNoInline(param: Int, operation: IntMapper): Int {
val o = operation //compiler does not like this
return o(param)
}
This code won't compile, yielding the error:
Illegal usage of inline-parameter 'operation' in '...'. Add 'noinline' modifier to the parameter declaration.
The reason is that the compiler is unable to inline this code, particularly the operation parameter. If operation is not wrapped in an object (which would be the result of applying inline), how can it be assigned to a variable at all? In this case, the compiler suggests making the argument noinline. Having an inline function with a single noinline function does not make any sense, don't do that. However, if there are multiple parameters of functional types, consider inlining some of them if required.
So here are some suggested rules:
You can inline when all functional type parameters are called directly or passed to other inline function
You should inline when ↑ is the case.
You cannot inline when function parameter is being assigned to a variable inside the function
You should consider inlining if at least one of your functional type parameters can be inlined, use noinline for the others.
You should not inline huge functions, think about generated byte code. It will be copied to all places the function is called from.
Another use case is reified type parameters, which require you to use inline. Read here.
Use inline for preventing object creation
Lambdas are converted to classes
In Kotlin/JVM, function types (lambdas) are converted to anonymous/regular classes that extend the interface Function. Consider the following function:
fun doSomethingElse(lambda: () -> Unit) {
println("Doing something else")
lambda()
}
The function above, after compilation will look like following:
public static final void doSomethingElse(Function0 lambda) {
System.out.println("Doing something else");
lambda.invoke();
}
The function type () -> Unit is converted to the interface Function0.
Now let's see what happens when we call this function from some other function:
fun doSomething() {
println("Before lambda")
doSomethingElse {
println("Inside lambda")
}
println("After lambda")
}
Problem: objects
The compiler replaces the lambda with an anonymous object of Function type:
public static final void doSomething() {
System.out.println("Before lambda");
doSomethingElse(new Function() {
public final void invoke() {
System.out.println("Inside lambda");
}
});
System.out.println("After lambda");
}
The problem here is that, if you call this function in a loop thousands of times, thousands of objects will be created and garbage collected. This affects performance.
Solution: inline
By adding the inline keyword before the function, we can tell the compiler to copy that function's code at call-site, without creating the objects:
inline fun doSomethingElse(lambda: () -> Unit) {
println("Doing something else")
lambda()
}
This results in the copying of the code of the inline function as well as the code of the lambda() at the call-site:
public static final void doSomething() {
System.out.println("Before lambda");
System.out.println("Doing something else");
System.out.println("Inside lambda");
System.out.println("After lambda");
}
This doubles the speed of the execution, if you compare with/without inline keyword with a million repetitions in a for loop. So, the functions that take other functions as arguments are faster when they are inlined.
Use inline for preventing variable capturing
When you use the local variables inside the lambda, it is called variable capturing(closure):
fun doSomething() {
val greetings = "Hello" // Local variable
doSomethingElse {
println("$greetings from lambda") // Variable capture
}
}
If our doSomethingElse() function here is not inline, the captured variables are passed to the lambda via the constructor while creating the anonymous object that we saw earlier:
public static final void doSomething() {
String greetings = "Hello";
doSomethingElse(new Function(greetings) {
public final void invoke() {
System.out.println(this.$greetings + " from lambda");
}
});
}
If you have many local variables used inside the lambda or calling the lambda in a loop, passing every local variable through the constructor causes the extra memory overhead. Using the inline function in this case helps a lot, since the variable is directly used at the call-site.
So, as you can see from the two examples above, the big chunk of performance benefit of inline functions is achieved when the functions take other functions as arguments. This is when the inline functions are most beneficial and worth using. There is no need to inline other general functions because the JIT compiler already makes them inline under the hood, whenever it feels necessary.
Use inline for better control flow
Since non-inline function type is converted to a class, we can't write the return statement inside the lambda:
fun doSomething() {
doSomethingElse {
return // Error: return is not allowed here
}
}
This is known as non-local return because it's not local to the calling function doSomething(). The reason for not allowing the non-local return is that the return statement exists in another class (in the anonymous class shown previously). Making the doSomethingElse() function inline solves this problem and we are allowed to use non-local returns because then the return statement is copied inside the calling function.
Use inline for reified type parameters
While using generics in Kotlin, we can work with the value of type T. But we can't work with the type directly, we get the error Cannot use 'T' as reified type parameter. Use a class instead:
fun <T> doSomething(someValue: T) {
println("Doing something with value: $someValue") // OK
println("Doing something with type: ${T::class.simpleName}") // Error
}
This is because the type argument that we pass to the function is erased at runtime. So, we cannot possibly know exactly which type we are dealing with.
Using an inline function along with the reified type parameter solves this problem:
inline fun <reified T> doSomething(someValue: T) {
println("Doing something with value: $someValue") // OK
println("Doing something with type: ${T::class.simpleName}") // OK
}
Inlining causes the actual type argument to be copied in place of T. So, for example, the T::class.simpleName becomes String::class.simpleName, when you call the function like doSomething("Some String"). The reified keyword can only be used with inline functions.
Avoid inline when calls are repetitive
Let's say we have the following function that is called repetitively at different abstraction levels:
inline fun doSomething() {
println("Doing something")
}
First abstraction level
inline fun doSomethingAgain() {
doSomething()
doSomething()
}
Results in:
public static final void doSomethingAgain() {
System.out.println("Doing something");
System.out.println("Doing something");
}
At first abstraction level, the code grows at: 21 = 2 lines.
Second abstraction level
inline fun doSomethingAgainAndAgain() {
doSomethingAgain()
doSomethingAgain()
}
Results in:
public static final void doSomethingAgainAndAgain() {
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
}
At second abstraction level, the code grows at: 22 = 4 lines.
Third abstraction level
inline fun doSomethingAgainAndAgainAndAgain() {
doSomethingAgainAndAgain()
doSomethingAgainAndAgain()
}
Results in:
public static final void doSomethingAgainAndAgainAndAgain() {
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
System.out.println("Doing something");
}
At third abstraction level, the code grows at: 23 = 8 lines.
Similarly, at the fourth abstraction level, the code grows at 24 = 16 lines and so on.
The number 2 is the number of times the function is called at each abstraction level. As you can see the code grows exponentially not only at the last level but also at every level, so that's 16 + 8 + 4 + 2 lines. I have shown only 2 calls and 3 abstraction levels here to keep it concise but imagine how much code will be generated for more calls and more abstraction levels. This increases the size of your app. This is another reason why you shouldn't inline each and every function in your app.
Avoid inline in recursive cycles
Avoid using the inline function for recursive cycles of function calls as shown in the following code:
// Don't use inline for such recursive cycles
inline fun doFirstThing() { doSecondThing() }
inline fun doSecondThing() { doThirdThing() }
inline fun doThirdThing() { doFirstThing() }
This will result in a never ending cycle of the functions copying the code. The compiler gives you an error: The 'yourFunction()' invocation is a part of inline cycle.
Can't use inline when hiding implementation
The public inline functions cannot access private functions, so they cannot be used for implementation hiding:
inline fun doSomething() {
doItPrivately() // Error
}
private fun doItPrivately() { }
In the inline function shown above, accessing the private function doItPrivately() gives an error: Public-API inline function cannot access non-public API fun.
Checking the generated code
Now, about the second part of your question:
but I found that there is no function object created by kotlin for a
non-inline function. why?
The Function object is indeed created. To see the created Function object, you need to actually call your lock() function inside the main() function as follows:
fun main() {
lock { println("Inside the block()") }
}
Generated class
The generated Function class doesn't reflect in the decompiled Java code. You need to directly look into the bytecode. Look for the line starting with:
final class your/package/YourFilenameKt$main$1 extends Lambda implements Function0 { }
This is the class that is generated by the compiler for the function type that is passed to the lock() function. The main$1 is the name of the class that is created for your block() function. Sometimes the class is anonymous as shown in the example in the first section.
Generated object
In the bytecode, look for the line starting with:
GETSTATIC your/package/YourFilenameKt$main$1.INSTANCE
INSTANCE is the object that is created for the class mentioned above. The created object is a singleton, hence the name INSTANCE.
That's it! Hope that provides useful insight into inline functions.
Higher-order functions are very helpful and they can really improve the reusability of code. However, one of the biggest concerns about using them is efficiency. Lambda expressions are compiled to classes (often anonymous classes), and object creation in Java is a heavy operation. We can still use higher-order functions in an effective way, while keeping all the benefits, by making functions inline.
here comes the inline function into picture
When a function is marked as inline, during code compilation the compiler will replace all the function calls with the actual body of the function. Also, lambda expressions provided as arguments are replaced with their actual body. They will not be treated as functions, but as actual code.
In short:- Inline-->rather than being called ,they are replaced by the function's body code at compile time...
In Kotlin, using a function as a parameter of another function (so called higher-order functions) feels more natural than in Java.
Using lambdas has some disadvantages, though. Since they’re anonymous classes (and therefore, objects), they need memory (and might even add to the overall method count of your app).
To avoid this, we can inline our methods.
fun notInlined(getString: () -> String?) = println(getString())
inline fun inlined(getString: () -> String?) = println(getString())
From the above example:- These two functions do exactly the same thing - printing the result of the getString function. One is inlined and one is not.
If you’d check the decompiled java code, you would see that the methods are completely identical. That’s because the inline keyword is an instruction to the compiler to copy the code into the call-site.
However, if we are passing any function type to another function like below:
//Compile time error… Illegal usage of inline function type ftOne...
inline fun Int.doSomething(y: Int, ftOne: Int.(Int) -> Int, ftTwo: (Int) -> Int) {
//passing a function type to another function
val funOne = someFunction(ftOne)
/*...*/
}
To solve that, we can rewrite our function as below:
inline fun Int.doSomething(y: Int, noinline ftOne: Int.(Int) -> Int, ftTwo: (Int) -> Int) {
//passing a function type to another function
val funOne = someFunction(ftOne)
/*...*/}
Suppose we have a higher order function like below:
inline fun Int.doSomething(y: Int, noinline ftOne: Int.(Int) -> Int) {
//passing a function type to another function
val funOne = someFunction(ftOne)
/*...*/}
Here, the compiler will tell us to not use the inline keyword when there is only one lambda parameter and we are passing it to another function. So, we can rewrite above function as below:
fun Int.doSomething(y: Int, ftOne: Int.(Int) -> Int) {
//passing a function type to another function
val funOne = someFunction(ftOne)
/*...*/
}
Note:-we had to remove the keyword noinline as well because it can be used only for inline functions!
Suppose we have function like this -->
fun intercept() {
// ...
val start = SystemClock.elapsedRealtime()
val result = doSomethingWeWantToMeasure()
val duration = SystemClock.elapsedRealtime() - start
log(duration)
// ...}
This works fine but the meat of the function’s logic is polluted with measurement code making it harder for your colleagues to work what’s going on. :)
Here’s how an inline function can help this code:
fun intercept() {
// ...
val result = measure { doSomethingWeWantToMeasure() }
// ...
}
}
inline fun <T> measure(action: () -> T) {
val start = SystemClock.elapsedRealtime()
val result = action()
val duration = SystemClock.elapsedRealtime() - start
log(duration)
return result
}
Now I can concentrate on reading what the intercept() function’s main intention is without skipping over lines of measurement code. We also benefit from the option of reusing that code in other places where we want to
inline allows you to call a function with a lambda argument within a closure ({ ... }) rather than passing the lambda like measure(myLamda)
When is this useful?
The inline keyword is useful for functions that accept other functions, or lambdas, as arguments.
Without the inline keyword on a function, that function's lambda argument gets converted at compile time to an instance of a Function interface with a single method called invoke(), and the code in the lambda is executed by calling invoke() on that Function instance inside the function body.
With the inline keyword on a function, that compile time conversion never happens. Instead, the body of the inline function gets inserted at its call site and its code is executed without the overhead of creating a function instance.
Hmmm? Example in android -->
Let's say we have a function in an activity router class to start an activity and apply some extras
fun startActivity(context: Context,
activity: Class<*>,
applyExtras: (intent: Intent) -> Unit) {
val intent = Intent(context, activity)
applyExtras(intent)
context.startActivity(intent)
}
This function creates an intent, applies some extras by calling the applyExtras function argument, and starts the activity.
If we look at the compiled bytecode and decompile it to Java, this looks something like:
void startActivity(Context context,
Class activity,
Function1 applyExtras) {
Intent intent = new Intent(context, activity);
applyExtras.invoke(intent);
context.startActivity(intent);
}
Let's say we call this from a click listener in an activity:
override fun onClick(v: View) {
router.startActivity(this, SomeActivity::class.java) { intent ->
intent.putExtra("key1", "value1")
intent.putExtra("key2", 5)
}
}
The decompiled bytecode for this click listener would then look like something like this:
#Override void onClick(View v) {
router.startActivity(this, SomeActivity.class, new Function1() {
#Override void invoke(Intent intent) {
intent.putExtra("key1", "value1");
intent.putExtra("key2", 5);
}
}
}
A new instance of Function1 gets created every time the click listener is triggered. This works fine, but it's not ideal!
Now let's just add inline to our activity router method:
inline fun startActivity(context: Context,
activity: Class<*>,
applyExtras: (intent: Intent) -> Unit) {
val intent = Intent(context, activity)
applyExtras(intent)
context.startActivity(intent)
}
Without changing our click listener code at all, we're now able to avoid the creation of that Function1 instance. The Java equivalent of the click listener code would now look something like:
#Override void onClick(View v) {
Intent intent = new Intent(context, SomeActivity.class);
intent.putExtra("key1", "value1");
intent.putExtra("key2", 5);
context.startActivity(intent);
}
Thats it.. :)
To "inline" a function basically means to copy a function's body and paste it at the function's call site. This happens at compile time.
The most important case when we use the inline modifier is when we define util-like functions with parameter functions. Collection or string processing (like filter, map or joinToString) or just standalone functions are a perfect example.
This is why the inline modifier is mostly an important optimization for library developers. They should know how it works and what are its improvements and costs. We should use the inline modifier in our projects when we define our own util functions with function type parameters.
If we don’t have function type parameter, reified type parameter, and we don’t need non-local return, then we most likely shouldn’t use the inline modifier. This is why we will have a warning on Android Studio or IDEA IntelliJ.
Also, there is a code size problem. Inlining a large function could dramatically increase the size of the bytecode because it's copied to every call site. In such cases, you can refactor the function and extract code to regular functions.
One simple case where you might want one is when you create a util function that takes in a suspend block. Consider this.
fun timer(block: () -> Unit) {
// stuff
block()
//stuff
}
fun logic() { }
suspend fun asyncLogic() { }
fun main() {
timer { logic() }
// This is an error
timer { asyncLogic() }
}
In this case, our timer won't accept suspend functions. To solve it, you might be tempted to make it suspend as well
suspend fun timer(block: suspend () -> Unit) {
// stuff
block()
// stuff
}
But then it can only be used from coroutines/suspend functions itself. Then you'll end up making an async version and a non-async version of these utils. The problem goes away if you make it inline.
inline fun timer(block: () -> Unit) {
// stuff
block()
// stuff
}
fun main() {
// timer can be used from anywhere now
timer { logic() }
launch {
timer { asyncLogic() }
}
}
Here is a kotlin playground with the error state. Make the timer inline to solve it.
fun higherOrder(lambda:():Unit){
//invoking lambda
lambda()
}
//Normal function calling higher-order without inline
fun callingHigerOrder() {
higherOrder()
//Here an object will be created for the lambda inside the higher-order function
}
//Normal function calling higher-order with inline
fun callingHigerOrder() {
higherOrder()
//Here there will be no object created and the contents of the lambda will be called directly into this calling function.
}
use inline if you want to avoid object creation at the calling side.
So when using inline, as we understood lambda will be the part of calling function incase if there is a return call inside the lambda block then whole calling function will get returned this is called non-local return.
To avoid non-local return use cross-inline before lambda block in the higher-order function.
I was struggling to describe this succintly in the title so I'll paste in my typescript code that achieves what I'm talking about -
aggregate<T, A>(args: A[], invokable: (arg: A) => promise<T>): promise<T[]> {
let allPromises = new Array<promise<T>>();
for (let arg of args) {
allPromises.push(invokable(arg));
}
return promise.all(allPromises);
}
This takes a list of arguments of type A and for each of them invokes some function (which returns a promise which returns type T). Each of these promises are collected into a list which is then all-ified and returned.
My question is, does this function already exist in Bluebird as I'd rather do things properly and use that existing, tested functionality! I had problems getting my head around some of the documentation so I might not have grokked something I should have!
Your problem is perfectly solvable with Array.prototype.map.
Your code can be turned into:
aggregate<T, A>(args: A[], invokable: (arg: A) => promise<T>): promise<T[]> {
return promise.all(args.map(invocable));
}
I hear a lot about map/reduce, especially in the context of Google's massively parallel compute system. What exactly is it?
From the abstract of Google's MapReduce research publication page:
MapReduce is a programming model and
an associated implementation for
processing and generating large data
sets. Users specify a map function
that processes a key/value pair to
generate a set of intermediate
key/value pairs, and a reduce function
that merges all intermediate values
associated with the same intermediate
key.
The advantage of MapReduce is that the processing can be performed in parallel on multiple processing nodes (multiple servers) so it is a system that can scale very well.
Since it's based from the functional programming model, the map and reduce steps each do not have any side-effects (the state and results from each subsection of a map process does not depend on another), so the data set being mapped and reduced can each be separated over multiple processing nodes.
Joel's Can Your Programming Language Do This? piece discusses how understanding functional programming was essential in Google to come up with MapReduce, which powers its search engine. It's a very good read if you're unfamiliar with functional programming and how it allows scalable code.
See also: Wikipedia: MapReduce
Related question: Please explain mapreduce simply
Map is a function that applies another function to all the items on a list, to produce another list with all the return values on it. (Another way of saying "apply f to x" is "call f, passing it x". So sometimes it sounds nicer to say "apply" instead of "call".)
This is how map is probably written in C# (it's called Select and is in the standard library):
public static IEnumerable<R> Select<T, R>(this IEnumerable<T> list, Func<T, R> func)
{
foreach (T item in list)
yield return func(item);
}
As you're a Java dude, and Joel Spolsky likes to tell GROSSLY UNFAIR LIES about how crappy Java is (actually, he's not lying, it is crappy, but I'm trying to win you over), here's my very rough attempt at a Java version (I have no Java compiler, and I vaguely remember Java version 1.1!):
// represents a function that takes one arg and returns a result
public interface IFunctor
{
object invoke(object arg);
}
public static object[] map(object[] list, IFunctor func)
{
object[] returnValues = new object[list.length];
for (int n = 0; n < list.length; n++)
returnValues[n] = func.invoke(list[n]);
return returnValues;
}
I'm sure this can be improved in a million ways. But it's the basic idea.
Reduce is a function that turns all the items on a list into a single value. To do this, it needs to be given another function func that turns two items into a single value. It would work by giving the first two items to func. Then the result of that along with the third item. Then the result of that with the fourth item, and so on until all the items have gone and we're left with one value.
In C# reduce is called Aggregate and is again in the standard library. I'll skip straight to a Java version:
// represents a function that takes two args and returns a result
public interface IBinaryFunctor
{
object invoke(object arg1, object arg2);
}
public static object reduce(object[] list, IBinaryFunctor func)
{
if (list.length == 0)
return null; // or throw something?
if (list.length == 1)
return list[0]; // just return the only item
object returnValue = func.invoke(list[0], list[1]);
for (int n = 1; n < list.length; n++)
returnValue = func.invoke(returnValue, list[n]);
return returnValue;
}
These Java versions need generics adding to them, but I don't know how to do that in Java. But you should be able to pass them anonymous inner classes to provide the functors:
string[] names = getLotsOfNames();
string commaSeparatedNames = (string)reduce(names,
new IBinaryFunctor {
public object invoke(object arg1, object arg2)
{ return ((string)arg1) + ", " + ((string)arg2); }
}
Hopefully generics would get rid of the casts. The typesafe equivalent in C# is:
string commaSeparatedNames = names.Aggregate((a, b) => a + ", " + b);
Why is this "cool"? Simple ways of breaking up larger calculations into smaller pieces, so they can be put back together in different ways, are always cool. The way Google applies this idea is to parallelization, because both map and reduce can be shared out over several computers.
But the key requirement is NOT that your language can treat functions as values. Any OO language can do that. The actual requirement for parallelization is that the little func functions you pass to map and reduce must not use or update any state. They must return a value that is dependent only on the argument(s) passed to them. Otherwise, the results will be completely screwed up when you try to run the whole thing in parallel.
After getting most frustrated with either very long waffley or very short vague blog posts I eventually discovered this very good rigorous concise paper.
Then I went ahead and made it more concise by translating into Scala, where I've provided the simplest case where a user simply just specifies the map and reduce parts of the application. In Hadoop/Spark, strictly speaking, a more complex model of programming is employed that require the user to explicitly specify 4 more functions outlined here: http://en.wikipedia.org/wiki/MapReduce#Dataflow
import scalaz.syntax.id._
trait MapReduceModel {
type MultiSet[T] = Iterable[T]
// `map` must be a pure function
def mapPhase[K1, K2, V1, V2](map: ((K1, V1)) => MultiSet[(K2, V2)])
(data: MultiSet[(K1, V1)]): MultiSet[(K2, V2)] =
data.flatMap(map)
def shufflePhase[K2, V2](mappedData: MultiSet[(K2, V2)]): Map[K2, MultiSet[V2]] =
mappedData.groupBy(_._1).mapValues(_.map(_._2))
// `reduce` must be a monoid
def reducePhase[K2, V2, V3](reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)])
(shuffledData: Map[K2, MultiSet[V2]]): MultiSet[V3] =
shuffledData.flatMap(reduce).map(_._2)
def mapReduce[K1, K2, V1, V2, V3](data: MultiSet[(K1, V1)])
(map: ((K1, V1)) => MultiSet[(K2, V2)])
(reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)]): MultiSet[V3] =
mapPhase(map)(data) |> shufflePhase |> reducePhase(reduce)
}
// Kinda how MapReduce works in Hadoop and Spark except `.par` would ensure 1 element gets a process/thread on a cluster
// Furthermore, the splitting here won't enforce any kind of balance and is quite unnecessary anyway as one would expect
// it to already be splitted on HDFS - i.e. the filename would constitute K1
// The shuffle phase will also be parallelized, and use the same partition as the map phase.
abstract class ParMapReduce(mapParNum: Int, reduceParNum: Int) extends MapReduceModel {
def split[T](splitNum: Int)(data: MultiSet[T]): Set[MultiSet[T]]
override def mapPhase[K1, K2, V1, V2](map: ((K1, V1)) => MultiSet[(K2, V2)])
(data: MultiSet[(K1, V1)]): MultiSet[(K2, V2)] = {
val groupedByKey = data.groupBy(_._1).map(_._2)
groupedByKey.flatMap(split(mapParNum / groupedByKey.size + 1))
.par.flatMap(_.map(map)).flatten.toList
}
override def reducePhase[K2, V2, V3](reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)])
(shuffledData: Map[K2, MultiSet[V2]]): MultiSet[V3] =
shuffledData.map(g => split(reduceParNum / shuffledData.size + 1)(g._2).map((g._1, _)))
.par.flatMap(_.map(reduce))
.flatten.map(_._2).toList
}
Map is a native JS method that can be applied to an array. It creates a new array as a result of some function mapped to every element in the original array. So if you mapped a function(element) { return element * 2;}, it would return a new array with every element doubled. The original array would go unmodified.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map
Reduce is a native JS method that can also be applied to an array. It applies a function to an array and has an initial output value called an accumulator. It loops through each element in the array, applies a function, and reduces them to a single value (which begins as the accumulator). It is useful because you can have any output you want, you just have to start with that type of accumulator. So if I wanted to reduce something into an object, I would start with an accumulator {}.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce?v=a
I am trying to understand how you return non-primitives (i.e. types that do not implement Copy). If you return something like a i32, then the function creates a new value in memory with a copy of the return value, so it can be used outside the scope of the function. But if you return a type that doesn't implement Copy, it does not do this, and you get ownership errors.
I have tried using Box to create values on the heap so that the caller can take ownership of the return value, but this doesn't seem to work either.
Perhaps I am approaching this in the wrong manner by using the same coding style that I use in C# or other languages, where functions return values, rather than passing in an object reference as a parameter and mutating it, so that you can easily indicate ownership in Rust.
The following code examples fails compilation. I believe the issue is only within the iterator closure, but I have included the entire function just in case I am not seeing something.
pub fn get_files(path: &Path) -> Vec<&Path> {
let contents = fs::walk_dir(path);
match contents {
Ok(c) => c.filter_map(|i| { match i {
Ok(d) => {
let val = d.path();
let p = val.as_path();
Some(p)
},
Err(_) => None } })
.collect(),
Err(e) => panic!("An error occurred getting files from {:?}: {}", pa
th, e)
}
}
The compiler gives the following error (I have removed all the line numbers and extraneous text):
error: `val` does not live long enough
let p = val.as_path();
^~~
in expansion of closure expansion
expansion site
reference must be valid for the anonymous lifetime #1 defined on the block...
...but borrowed value is only valid for the block suffix following statement
let val = d.path();
let p = val.as_path();
Some(p)
},
You return a value by... well returning it. However, your signature shows that you are trying to return a reference to a value. You can't do that when the object will be dropped at the end of the block because the reference would become invalid.
In your case, I'd probably write something like
#![feature(fs_walk)]
use std::fs;
use std::path::{Path, PathBuf};
fn get_files(path: &Path) -> Vec<PathBuf> {
let contents = fs::walk_dir(path).unwrap();
contents.filter_map(|i| {
i.ok().map(|p| p.path())
}).collect()
}
fn main() {
for f in get_files(Path::new("/etc")) {
println!("{:?}", f);
}
}
The main thing is that the function returns a Vec<PathBuf> — a collection of a type that owns the path, and are more than just references into someone else's memory.
In your code, you do let p = val.as_path(). Here, val is a PathBuf. Then you call as_path, which is defined as: fn as_path(&self) -> &Path. This means that given a reference to a PathBuf, you can get a reference to a Path that will live as long as the PathBuf will. However, you are trying to keep that reference around longer than vec will exist, as it will be dropped at the end of the iteration.
How do you return non-copyable types?
By value.
fn make() -> String { "Hello, World!".into() }
There is a disconnect between:
the language semantics
the implementation details
Semantically, returning by value is moving the object, not copying it. In Rust, any object is movable and, optionally, may also be Clonable (implement Clone) and Copyable (implement Clone and Copy).
That the implementation of copying or moving uses a memcpy under the hood is a detail that does not affect the semantics, only performance. Furthermore, this being an implementation detail means that it can be optimized away without affecting the semantics, which the optimizer will try very hard to do.
As for your particular code, you have a lifetime issue. You cannot return a reference to a value if said reference may outlive the value (for then, what would it reference?).
The simple fix is to return the value itself: Vec<PathBuf>. As mentioned, it will move the paths, not copy them.