Create n agents and calculate average number - javaagents

I want to create system of n agents. All agents are generate random Integer value. My goal is calculating average of these n numbers.
My simple idea of algorithm:
Every Agent sends message with its number to other agents
Every Agent calculates average number
Problems:
I just can't understand how I can create a variable number of agents
How I can take output result
Maybe somebody know how I can do this?

The examples online tend to focus on using the Boot class:
java -cp jade.jar jade.Boot -agents agentName:org.agents.MyAgentClass
You could spawn more agents simply by adding more to the -agents option command-line args (separated by semi-colons):
java -cp jade.jar jade.Boot -agents \
agent1:org.agents.MyAgentClass;agent2:org.agents.MyAgentClass
If you need a variable number of agents, you could move this to a bash script that appends more agents depending on a parameter.
If you really want to go crazy, you can create your own container and add agents to it from your own code and bypass the Boot class. Since your use case is so simple, I don't know that this would be a good way to go yet.

Related

How do I get molecular structural information from SMILES

My question is: is there any algorithm that can convert a SMILES structure into a topological fingerprint? For example if glycerol is the input the answer would be 3 x -OH , 2x -CH2 and 1x -CH.
I'm trying to build a python script that can predict the density of a mixture using an artificial neural network. As an input I want to have the structure/fingerprint of my molecules starting from the SMILES structure.
I'm already familiar with -rdkit and the morganfingerprint but that is not what i'm looking for. I'm also aware that I can use the 'matching substructure' search in rdkit, but then I would have to define all the different subgroups. Is there any more convenient/shorter way?
For most of the structures, there's no existing option to find the fragments. However, there's a module in rdkit that can provide you the number of fragments especially when it's a function group. Check it out here. As an example, let's say you want to find the number of aliphatic -OH groups in your molecule. You can simply call the following function to do that
from rdkit.Chem.Fragments import fr_Al_OH
fr_Al_OH(mol)
or the following would return the number of aromatic -OH groups:
from rdkit.Chem.Fragments import fr_Ar_OH
fr_Ar_OH(mol)
Similarly, there are 83 more functions available. Some of them would be useful for your task. For the ones, you don't get the pre-written function, you can always go to the source code of these rdkit modules, figure out how they did it, and then implement them for your features. But as you already mentioned, the way would be to define a SMARTS string and then fragment matching. The fragment matching module can be found here.
If you want to predict densities of pure components before predicting the mixtures I recommend the following paper:
https://pubs.acs.org/doi/abs/10.1021/acs.iecr.6b03809
You can use the fragments specified by rdkit as mnis proposes. Or you could specify the groups as SMARTS patterns and look for them yourself using GetSubstructMatches as you proposed yourself.
Dissecting a molecule into specific groups is not as straightforward as it might appear in the first place. You could also use an algorithm I published a while ago:
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0382-3
It includes a list of SMARTS for the UNIFAC model, but you could also use them for other things, like density prediction.

How to interpret tcl command in openOCD manual

I'm completely new to tcl and am trying to understand how to script the command "adapter usb location" in openOCD.
From the openOCD manual, the command has this description:
I want to point it to the port with the red arrow below:
Thanks.
It's not 100% clear, but I would expect (from that snippet of documentation) a bus location to be a dotted “path” something like:
1-6
where the values are:
1 — Bus ID
6 — Port ID
Which would result in a call to the command being done like this:
adapter usb location 1-6
When there's a more complex structure involved (internally because of chained hubs) such as with the item above the one you pointed at, I'd instead expect:
1-5.3
Notice that there are is a sequence of port IDs (5.3) in there to represent the structure. The resulting call would then be:
adapter usb location 1-5.3
Now for the caveats!
I can't tell what the actual format of those IDs is. They might just be numbers, or they might have some textual prefix (e.g., bus1-port6). Those text prefixes, if present, might contain a space (or other metacharacter) which will be deeply annoying to use if true. You should be able to run adapter usb location without any other arguments to see what the current location is; be aware though that it might return the empty string (or give an error) if there is no current location. I welcome feedback on this, as that information appears to be not present in any online documentation I can find (and I don't have things installed so I can't just check).
I also have no idea what (if anything) to do with the device and interface IDs.

Blocking in cross validation in mlr with subject id

I have a dataset with multiple observations by participant. Participants are denoted by id. To account for this in the cross validation process, I add blocking = factor(id) to makeClassifTask() and blocking.cv = TRUE to makeResampleDesc(). However, if I leave id in the dataset, it will be used as a predictor. My question is: How do I correctly use blocking? My take would be to create a new variable, e.g. participant.id (outside of the dataset), next to remove id from the original dataset and then to use blocking = factor(participant.id), but I am not sure if this is the correct way to handle blocking.
Rather than supplying a variable for blocking you can provide a custom factor vector that specifies the observations which belong together. This is also shown in the tutorial.
This way you do not need to have the variable "participant.id" in the dataset.
Also make sure that you really want to use "blocking". Did you have a look at "grouping" already? The differences between both are also described in the linked tutorial section.

Requesting nodes by numbers and their names in SGE

How to request the number of nodes (not procs), while job submission in SGE?
for e.g. In TORQUE, we can specify qsub -l nodes=3
How to request the nodes by their names in SGE?
for e.g. In TORQUE, we can do this by qsub -l nodes=abc+xyz+pqr, where abc, xyz and pqr are hostnames
For single hostname, qsub -l hostname=abc it works. But how do I delimit multiple hostnames in SGE?
Requesting the number of nodes with Grid Engine is done indirectly.
When you want to submit a parallel job then you have to request
a parallel environment (man sge_pe) together with the amount
of slots (processors etc) like qsub -pe mytestpe 12...
Depending on the allocation_rule defined in the parallel environment
(qconf -sp mytestpe) the slots are distributed over one or more
nodes. If you have a so called fixed allocation rule where you
just add a certain number as allocation rule like 4 (4 slots per
host) it is easy. If you like one host just submit with -pe mytestpe 4
if you want 10 nodes just submit with -pe mytestpe 40.
Node name can be requested by the -l h=abc. Since node names are
RESTRINGS (regular expression strings) in Grid Engine you can create
a regular expression for host filtering: qsub -l h="abc|xyz".
You can also create host groups (qconf -ahgrp) and request
so called queue domains (qsub -q all.q##mygroup).
Daniel
http://www.gridengine.eu
you can use -tc to limit the number of concurrent tasks (i.e., number of slots that will be used for an array job). I use this when I submit array jobs with 100 sub-jobs to limit the impact on our queue, defaulting to 10 simultaneous jobs with -tc 10. As each job finishes, another array job from the pending pool will be submitted.
the only way I've been able to figure out to do this would be to set up specific resource quota sets (using qconf -mrqs) specifying the particular host groups you want to use. You would have to set up all of the combinations that you want, first. I don't see a real reason to specify specific hosts, though, unless these hosts have specific resources that you want to use (in which case, I'd set up consumable resources for those and apply the appropriate number of resources to each host that can supply them, then use that instead of specifying the specific hosts for a particular job).

Where do I get "junk" data to help test my code?

For my C class I've written a simple statistics program -- it calculates max, min, mean, etc. Anyway, I've gotten the program successfully compiled, so all I need to do now is actually test it; the only problem is that I don't have anything to test with.
In my case, I need a list of doubles -- my program needs to accept between 2 and 1,000,000; Is there some resource online that can produce lists of otherwise meaningless data? I know Lorem Ipsum gets used for typesetting, and I'm wondering if there's something similar for various types of numerical data.
Or am I out of luck, and I'll have to just create my own junk data?
The problem with testing software is not the source of the data, but the test set. I mean, can you test an int sum(int a, int b) method by just inputting random numbers to it? No, you need to know what to expect. This is a test set: inputs and expected outputs.
What do you say when you discover that 548888876+99814465=643503341? How can you tell this is the real result?
More than finding random numbers to give your program, you must somehow know the results of your computation in advance in order to compare it.
There are a few ways to do it: what I suggest you is to pick a random number generator (amphetamachine +1) and use the data both on your code and on a program that you already know is good, ie. Matlab for your purposes. After computing your statistics with both, compare your results and see if you coded good or need to do some debug.
By the way, I volountarily altered the result of the above sum...
What about just generating a random double?
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
double number = r.NextDouble();
//do something with the value
}
Since the data you need will depend on the program, there is no source of generic data that I know of.
If you are able to write that program, you should be able to write a script to generate dummy data for yourself.
Just use a loop to print out random numbers within the range your program can accept.
Generate a file with random bytes:
$ dd \
of=random-bytes \
if=/dev/urandom \
bs=1024 \
count=1024
http://www.generatedata.com/#generator
I've used that data generator before with some success. To be fair, it will usually involve copy/pasting the data it generates into some other format that you'll be able to read in.
You can generate your own data for this specific case quite easily though. Loop a random number of times with a terminating condition of 1,000,000. Generating random doubles within the range you expect. Feed that in and away you go.
Generating your own test data in this case is probably the best option.
You could take the first million digits of pi and chop them up into however many doubles you want.
The first few could be 3.14159, 2.65358, 9.79323, 8.46264, 3.38327, 9.50288, 4.19716, and 9.39937, for example.