How to run Julia function on specific processor using remotecall(), when the function itself does not have return - function

I tried to use remotecall() in julia to distribute work to specific processor. The function I like to run does not have any return but it will output something. I can not make it work as there is no output file after running the code.
This is the test code I am creating:
using DelimitedFiles
addprocs(4) # add 4 processors
#everywhere function test(x) # Define the function
print("hi")
writedlm(string("test",string(x),".csv"), [x], ',')
end
remotecall(test, 2, 2) # To run the function on process 2
remotecall(test, 3, 3) # To run the function on process 3
This is the output I am getting:
Future(3, 1, 67, nothing)
And there is no output file (csv), or "hi" shown
I wonder if anyone can help me with this or I did anything wrong. I am fairly new to julia and have never used parallel processing.
The background is I need to run a big simulation (A big function with bunch of includes, but no direct return outputs) lots of times, and I like to split the work to different processors.
Thanks a lot

If you want to use a module function in a worker, you need to import that module locally in that worker first, just like you have to do it in your 'root' process. Therefore your using DelimitedFiles directive needs to occur "#everywhere" first, rather than just on the 'root' process. In other words:
#everywhere using DelimitedFiles
Btw, I am assuming you're using a relatively recent version of Julia and simply forgot to add the using Distributed directive in your example.
Furthermore, when you perform a remote call, what you get back is a "Future" object, which is a way of allowing you to obtain the 'future results of that computation' from that worker, once they're finished. To get the results of that 'future computation', use fetch.
This is all very simplistic and general information, since you haven't provided a specific example that can be copy / pasted and answered specifically. Have a look at the relevant section in the manual, it's fairly clearly written: https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1

Related

Save function results when a script is executed

Pretty new to python. I have a machine learning script, and what I would like to do is, every time the script is run, I would like to save the results. But what I don't understand is if all the code is in one script, how to save the results without overwriting? So for example:
auc_score = cross_val_score(logreg_model, X_RFECV, y_vars, cv=kf, scoring='roc_auc').mean()
auc_scores=[]
def auc_log():
auc_scores.append(auc_score)
return(auc_scores)
auc_log()
Everytime I run this .py file, the auc_scores list will start with blank, and the list won't update until each time the function is executed, but if you run the whole script than obvious the above will execute and start the saved list as blank again. I feel this is fairly simple, just not thinking about this properly from a continuous deployment perspective. Thanks!
It might be as well to use each result list or zero list as variables of acc_log function, which can leave all function result.
For example,
auc_score=cross_val_score(logreg_model, X_RFECV, y_vars, cv=kf, scoring='roc_auc').mean()
#if auc_score is 'int' or 'float', you must conver it to list type
auc_score_=[]
auc_score_.append(auc_score)
auc_score_zero=[]
def acu_log(acu_score_1,auc_score_2):
acu_scores=acu_score_1+auc_score_2
return acu_scores
initial_log=acu_log(auc_score_zero, auc_score_)
#print (initial_log)
second_log=acu_log(initial_log, auc_score_)
#print (second_log)
If you want to save each acc_log list on HDD after returning the result at each step, 'pickle module' is convenient for treating it.
I’m not sure this is really what you want, but hope my answer contribute to solve your question

Program Structure to meet Martin's Clean Code Function Criteria

I have been reading Clean Code by Robert C. Martin and have a basic (but fundamental) question about functions and program structure.
The book emphasizes that functions should:
be brief (like 10 lines, or less)
do one, and only one, thing
I am a little unclear on how to apply this in practice. For example, I am developing a program to:
load a baseline text file
parse baseline text file
load a test text file
parse test text file
compare parsed test with parsed baseline
aggregate results
I have tried two approaches, but neither seem to meet Martin's criteria:
APPROACH 1
setup a Main function that centrally commands other functions in the workflow. But then main() can end up being very long (violates #1), and is obviously doing many things (violates #2). Something like this:
main()
{
// manage steps, one at a time, from start to finish
baseFile = loadFile("baseline.txt");
parsedBaseline = parseFile(baseFile);
testFile = loadFile("test.txt");
parsedTest = parseFile(testFile);
comparisonResults = compareFiles(parsedBaseline, parsedTest);
aggregateResults(comparisonResults);
}
APPROACH 2
use Main to trigger a function "cascade". But each function is calling a dependency, so it still seems like they are doing more than one thing (violates #2?). For example, calling the aggregation function internally calls for results comparison. The flow also seems backwards, as it starts with the end goal and calls dependencies as it goes. Something like this:
main()
{
// trigger end result, and let functions internally manage
aggregateResults("baseline.txt", "comparison.txt");
}
aggregateResults(baseFile, testFile)
{
comparisonResults = compareFiles(baseFile, testFile);
// aggregate results here
return theAggregatedResult;
}
compareFiles(baseFile, testFile)
{
parsedBase = parseFile(baseFile);
parsedTest = parseFile(testFile);
// compare parsed files here
return theFileComparison;
}
parseFile(filename)
{
loadFile(filename);
// parse the file here
return theParsedFile;
}
loadFile(filename)
{
//load the file here
return theLoadedFile;
}
Obviously functions need to call one another. So what is the right way to structure a program to meet Martin's criteria, please?
I think you are interpreting rule 2 wrong by not taking context into account. The main() function only does one thing and that is everything, i.e. running the whole program. Let's say you have a convert_abc_file_xyz_file(source_filename, target_filename) then this function should only do the one thing its name (and arguments) implies: converting a file of format abc into one of format xyz. Of course on a lower level there are many things to be done to achieve this. For instancereading the source file (read_abc_file(…)), converting the data from format abc into format xyz (convert_abc_to_xyz(…)), and then writing the converted data into a new file (write_xyz_file(…)).
The second approach is wrong as it becomes impossible to write functions that only do one thing because every functions does all the other things in the ”cascaded” calls. In the first approach it is possible to test or reuse single functions, i.e. just call read_abc_file() to read a file. If that function calls convert_abc_to_xyz() which in turn calls write_xyz_file() that is not possible anymore.

Call a function that is not on the Matlab path WITHOUT ADDING THAT PATH

I have been searching an entire afternoon and have found no solution to call in matlab a function by specifying its path and not adding its directory to the path.
This question is quite similar to Is it possible to call a function that is not in the path in MATLAB?, but in my case, I do not want to call a built-in function, but just a normal function as defined in an m-file.
I think handles might be a solution (because apparently they can refer to functions not on the path), but I again found no way to create a handle without cd-ing to the directory, creating it there and the cd-ing back. Trying to 'explore' what a function handle object is and how to make one with a reference to a specific function not on the path has led me nowhere.
So the solution might come from two angles:
1) You know how to create a handle for an m-file in a specific directory.
2) You know a way to call a function not on the matlab path.
EDIT: I have just discovered the function functions(myhandle) which actually lets you see the filepath to which the handle is referring. But still no way to modify it though...
This is doable, but requires a bit of parsing, and a call to evalin.
I added (many years ago!) a function to the MATLAB Central File Exchange called externalFcn
http://www.mathworks.com/matlabcentral/fileexchange/4361-externalfcn
that manages calls to off-path functions. For instance, I have a function called offpathFcn that simply returns a structure with a success message, and the value of an input. Storing that function off my MATLAB path, I can call it using:
externalfcn('out = C:\MFILES_OffPath\offpathFcn(''this is a test'')');
This returns:
out =
success: 1
input: 'this is a test'
(Note that my implementation is limited, and improvable; you have to include an output with an equal sign for this to work. But it should show you how to achieve what you want.)
(MathWorks application engineer)
The solution as noted in the comment 1 to create a function handle before calling the function is nicely implemented by #Rody Oldenhuis' FEX Contribution:
http://www.mathworks.com/matlabcentral/fileexchange/45941-constructor-for-functionhandles
function [varargout]=funeval(fun,varargin)
% INPUT:
% fun: (char) full path to function file
curdir=cd;
[fundir,funname]=fileparts(fun);
cd(fundir);
[varargout{1:nargout}] =feval(funname,varargin{:})
cd(curdir);
I've modified Thierry Dalon's code to avoid the use of feval, which I always feel uncomfortable with. Note this still doesn't get around cd-ing to the directory in question, but well, it happens behind the scenes, so pretend it doesn't happen :-)
Also note what Ben Voigt pointed out above: calls to helper functions off the path will fail.
function [varargout]=funeval(FunctionHandle, FunctionPath, varargin)
% INPUT:
% FunctionHandle: handle to the function to be called; eg #MyFunction
% FunctionPath: the path to that function
% varargin: the arguments to be passed to Myfunction
curdir=cd;
cd(FunctionPath)
[varargout{1:nargout}] = FunctionHandle(varargin{:});
cd(curdir);
end
and calling it would look like
Output = funeval(#MyFunction, 'c:\SomeDirOffMatlabsPath\', InputArgToMyFunc)
The run command can run a script file from any directory, but it can't call a function (with input and output arguments).
Neither feval nor str2func permit directory information in the function string.
I suggest writing your own wrapper for str2func that:
saves the working directory
changes directory to the script directory
creates a function handle
restores the original working directory
Beware, however, that a handle to a function not in the path is likely to break, because the function will be unable to invoke any helper code stored in other files in its directory.

Organizing Notebooks & Saving Results in Mathematica

As of now I use 3 Notebook :
Functions
Where I have all the functions I created and call in the other Notebooks.
Transformation
Based on the original data, I compute transformations and add columns/List
When data is my raw data, I then call :
t1data : the result of the first transformation
t2data : the result of the second transformation
and so on,
I am yet at t20.
Display & Analysis
Using both the above I create Manipulate object that enable me to analyze the data.
Questions
Is there away to save the results of the Transformation Notebook such that t13data for example can be used in the Display & Analysis Notebooks without running all the previous computations (t1,t2,t3...t12) it is based on ?
Is there a way to use my Functions or transformed data without opening the corresponding Notebook ?
Does my separation strategy make sense at all ?
As of now I systematically open the 3 and have to run them all before being able to do anything, and it takes a while given my poor computing power and yet inefficient codes.
Saving variable states: can be done using DumpSave, Save or Put. Read back using Get or <<
You could make a package from your functions and read those back using Needs or <<
It's not something I usually do. I opt for a monolithic notebook containing everything (nicely layered with sections and subsections so that you can fold open or close) or for a package + slightly leaner analysis notebook depending on the weather and some other hidden variables.
Saving intermediate results
The native file format for Mathematica expressions is the .m file. This is human readable text format, and you can view the file in a text editor if you ever doubt what is, or is not being saved. You can load these files using Get. The shorthand form for Get is:
<< "filename.m"
Using Get will replace or refresh any existing assignments that are explicitly made in the .m file.
Saving intermediate results that are simple assignments (dat = ...) may be done with Put. The shorthand form for Put is:
dat >> "dat.m"
This saves only the assigned expression itself; to restore the definition you must use:
dat = << "dat.m"
See also PutAppend for appending data to a .m file as new results are created.
Saving results and function definitions that are complex assignments is done with Save. Examples of such assignments include:
f[x_] := subfunc[x, 2]
g[1] = "cat"
g[2] = "dog"
nCr = #!/(#2! (# - #2)!) &;
nPr = nCr[##] #2! &;
For the last example, the complexity is that nPr depends on nCr. Using Save it is sufficient to save only nPr to get a fully working definition of nPr: the definition of nCr will automatically be saved as well. The syntax is:
Save["nPr.m", nPr]
Using Save the assignments themselves are saved; to restore the definitions use:
<< "nPr.m" ;
Moving functions to a Package
In addition to Put and Save, or manual creation in a text editor, .m files may be generated automatically. This is done by creating a Notebook and setting Cell > Cell Properties > Initialization Cell on the cells that contain your function definitions. When you save the Notebook for the first time, Mathematica will ask if you want to create an Auto Save Package. Do so, and Mathematica will generate a .m file in parallel to the .nb file, containing the contents of all Initialization Cells in the Notebook. Further, it will update this .m file every time you save the Notebook, so you never need to manually update it.
Sine all Initialization Cells will be saved to the parallel .m file, I recommend using the Notebook only for the generation of this Package, and not also for the rest of your computations.
When managing functions, one must consider context. Not all functions should be global at all times. A series of related functions should often be kept in its own context which can then be easily exposed to or removed from $ContextPath. Further, a series of functions often rely on subfunctions that do not need to be called outside of the primary functions, therefore these subfunctions should not be global. All of this relates to Package creation. Incidentally, it also relates to the formatting of code, because knowing that not all subfunctions must be exposed as global gives one the freedom to move many subfunctions to the "top level" of the code, that is, outside of Module or other scoping constructs, without conflicting with global symbols.
Package creation is a complex topic. You should familiarize yourself with Begin, BeginPackage, End and EndPackage to better understand it, but here is a simple framework to get you started. You can follow it as a template for the time being.
This is an old definition I used before DeleteDuplicates existed:
BeginPackage["UU`"]
UnsortedUnion::usage = "UnsortedUnion works like Union, but doesn't \
return a sorted list. \nThis function is considerably slower than \
Union though."
Begin["`Private`"]
UnsortedUnion =
Module[{f}, f[y_] := (f[y] = Sequence[]; y); f /# Join###] &
End[]
EndPackage[]
Everything above goes in Initialization Cells. You can insert Text cells, Sections, or even other input cells without harming the generated Package: only the contents of the Initialization Cells will be exported.
BeginPackage defines the Context that your functions will belong to, and disables all non-System` definitions, preventing collisions. (There are ways to call other functions from your package, but that is better for another question).
By convention, a ::usage message is defined for each function that it to be accessible outside the package itself. This is not superfluous! While there are other methods, without this, you will not expose your function in the visible Context.
Next, you Begin a context that is for the package alone, conventionally "`Private`". After this point any symbols you define (that are not used outside of this Begin/End block) will not be exposed globally after the Package is loaded, and will therefore not collide with Global` symbols.
After your function definition(s), you close the block with End[]. You may use as many Begin/End blocks as you like, and I typically use a separate one for each function, though it is not required.
Finally, close with EndPackage[] to restore the environment to what it was before using BeginPackage.
After you save the Notebook and generate the .m package (let's say "mypackage.m"), you can load it with Get:
<< "mypackage.m"
Now, there will be a function UnsortedUnion in the Context UU` and it will be accessible globally.
You should also look into the functionality of Needs, but that is a little more advanced in my opinion, so I shall stop here.

How to define an function in the eshell (Erlang shell)?

Is there any way to define an Erlang function from within the Erlang shell instead of from an erl file (aka a module)?
Yes but it is painful. Below is a "lambda function declaration" (aka fun in Erlang terms).
1> F=fun(X) -> X+2 end.
%%⇒ #Fun <erl_eval.6.13229925>
Have a look at this post. You can even enter a module's worth of declaration if you ever needed. In other words, yes you can declare functions.
One answer is that the shell only evaluates expressions and function definitions are not expressions, they are forms. In an erl file you define forms not expressions.
All functions exist within modules, and apart from function definitions a module consists of attributes, the more important being the modules name and which functions are exported from it. Only exported functions can be called from other modules. This means that a module must be defined before you can define the functions.
Modules are the unit of compilation in erlang. They are also the basic unit for code handling, i.e. it is whole modules which are loaded into, updated, or deleted from the system. In this respect defining functions separately one-by-one does not fit into the scheme of things.
Also, from a purely practical point of view, compiling a module is so fast that there is very little gain in being able to define functions in the shell.
This depends on what you actually need to do.
There are functions that one could consider as 'throwaways', that is, are defined once to perform a test with, and then you move on. In such cases, the fun syntax is used. Although a little cumbersome, this can be used to express things quickly and effectively. For instance:
1> Sum = fun(X, Y) -> X + Y end.
#Fun<erl_eval.12.128620087>
2> Sum(1, 2).
3
defines an anonymous fun that is bound to the variable (or label) Sum. Meanwhile, the following defines a named fun, called F, that is used to create a new process whose PID (<0.80.0>) is bound to Pid. Note that F is called in a tail recursive fashion in the second clause of receive, enabling the process to loop until the message stop is sent to it.
3> Pid = spawn(fun F() -> receive stop -> io:format("Stopped~n"); Msg -> io:format("Got: ~p~n", [Msg]), F() end end).
<0.80.0>
4> Pid ! hello.
hello
Got: hello
5> Pid ! stop.
Stopped
stop
6>
However you might need to define certain utility functions that you intend to use over and over again in the Erlang shell. In this case, I would suggest using the user_default.erl file together with .erlang to automatically load these custom utility functions into the Erlang shell as soon as this is launched. For instance, you could write a function that compiles all the Erlang files in living in the current directory.
I have written a small guide on how to do this on this GitHub link. You might find it helpful and instructive.
If you want to define a function on the shell to use it as macro (because it encapsulates some functionality that you need frequently), have a look at
https://erldocs.com/current/stdlib/shell_default.html