ParagraphVectors in deeplearning4j - deep-learning

I am new in utilizing deeplearning4j. I am running the paragraphvector classifier on a dataset including labeled and unlabeled data, and got a result. When I run it again on the same dataset using a same configuration, I will get another results! The new results is close to the previous one, but why it generates slightly different results?! What I mean by slighltly different results is like at the first run, it detects and assigns two testing samples to the first class we have, and in the second run, it assigns those two samples or probably one of them to another class. It happens normally for just one or two maybe three samples. Maybe I needed to inform you in advance that we have three classes that they are all related to cancer types diseases.
Any hint/help/advice would be highly appreciated.
I use such a below configuration:
paragraphVectors = new ParagraphVectors.Builder()
.learningRate(0.2)
.minLearningRate(0.001)
.windowSize(2)
.iterations(3)
.batchSize(500)
.workers(4)
.stopWords(stopWords())
.minWordFrequency(10)
.layerSize(100)
.epochs(1)
.iterate(iterator)
.trainWordVectors(true)
.tokenizerFactory(tokenizerFactory)
.build();

Problem turned out to be bad input with the tokenizer.

Related

Splitting a feature collection by system index in Google Earth Engine?

I am trying to export a large feature collection from GEE. I realize that the Python API allows for this more easily than the Java does, but given a time constraint on my research, I'd like to see if I can extract the feature collection in pieces and then append the separate CSV files once exported.
I tried to use a filtering function to perform the task, one that I've seen used before with image collections. Here is a mini example of what I am trying to do
Given a feature collection of 10 spatial points called "points" I tried to create a new feature collection that includes only the first five points:
var points_chunk1 = points.filter(ee.Filter.rangeContains('system:index', 0, 5));
When I execute this function, I receive the following error: "An internal server error has occurred"
I am not sure why this code is not executing as expected. If you know more than I do about this issue, please advise on alternative approaches to splitting my sample, or on where the error in my code lurks.
Many thanks!
system:index is actually ID given by GEE for the feature and it's not supposed to be used like index in an array. I think JS should be enough to export a large featurecollection but there is a way to do what you want to do without relying on system:index as that might not be consistent.
First, it would be a good idea to know the number of features you are dealing with. This is because generally when you use size().getInfo() for large feature collections, the UI can freeze and sometimes the tab becomes unresponsive. Here I have defined chunks and collectionSize. It should be defined in client side as we want to do Export within the loop which is not possible in server size loops. Within the loop, you can simply creating a subset of feature starting from different points by converting the features to list and changing the subset back to feature collection.
var chunk = 1000;
var collectionSize = 10000
for (var i = 0; i<collectionSize;i=i+chunk){
var subset = ee.FeatureCollection(fc.toList(chunk, i));
Export.table.toAsset(subset, "description", "/asset/id")
}

Anylogic CompareRuns does not execute Main Agent

I am working on an agent-based model and now I'm trying to experiment with CompareRuns.
when I execute the experiment, it should simulate the model several times and after each simulation, a dataset of sample data should be filled.
there is also a state chart in Main agent and each state has a traceln("..."). so after passing through each state, something must be printed.
the problem is that neither the print commands return anything, nor the dataset in which I store my data returns anything but zeros.
P.S.: I also have a GIS map in my model. could that be the reason for misbehaving of Anylogic?
I found the problem and fixed it.
check and double check all the "Parameter"s in the Main agent and make sure they all have different names and labels. the problem was that two parameters had same labels and it messed up the whole experiment.
I fixed the labels and the default value of all parameters, deleted the CompareRun experiment, made a new one and it worked.

Extract aligned sections of FASTA to new file

I've already looked here and in other forums, but couldn't find the answer to my question. I want to design baits for a target enrichment Sequencing approach and have the output of a MarkerMiner search for orthologous loci from four different genomes with A. thaliana as a Reference as. These output alignments are separate Fasta-Files for each A. thaliana annotated gene with the sequences from my datasets aligned to it.
I have already run a script to filter out those loci supported to be orthologous by at least two of my four input datasets.
However, now, I'm stumped.
My alignments are gappy, since the input data is mostly RNAseq whereas the Reference contains the introns as well. So it looks like this :
AT01G1234567
ATCGATCGATGCGCGCTAGCTGAATCGATCGGATCGCGGTAGCTGGAGCTAGSTCGGATCGC
MyData1
CGATGCGCGC-----------CGGATCGCGG---------------CGGATCGC
MyData2
CGCTGCGCGC------------GGATAGCGG---------------CGGATCCC
To effectively design baits I now need to extract all the aligned parts from the file, so that I will end up with separate files; or separate alignments within the file; for the parts that are aligned between MyData and the Reference sequence with all the gappy parts excluded. There are about 1300 of these fasta files, so doing it manually is no option.
I have a bit of programming experience in python and with Linux command line tools, however I am completely lost on how to go about this. I would appreciate a hint, on what kind of tools are out there I could use or what kind of algorithm I need to come up with.
Thank you.
Cheers

Three rows of almost the same code behave differently

I have three dropdown boxes on a Main_Form. I will add the chosen content into three fields on the form, Form_Applications.
These three lines are added :
Form_Applications.Classification = Form_Main_Form.Combo43.Value
Form_Applications.Countryname_Cluster = Form_Main_Form.Combo56.Value
Form_Applications.Application = Form_Main_Form.Combo64.Value
The first two work perfectly but the last one gives error code 438!
I can enter in the immediate window :
Form_Applications.Classification = "what ever"
Form_Applications.Countryname_Cluster = "what ever"
but not for the third line. Then, after enter, the Object doesn't support this property or method error appears.
I didn't expect this error as I do exactly the same as in the first two lines.
Can you please help or do you need more info ?
In VBA Application is a special word and should not be used to address fields.
FormName.Application will return an object that points to the application instance that is running that form as opposed to an object within that form.
From the Application object you can do all sorts of other things such as executing external programs and other application level stuff like saving files/
Rename your Application field to something else, perhaps ApplicationCombo and change your line of code to match the new name. After doing this the code should execute as you expect.
Form_Applications.Application is referring to the application itself. It is not a field, so therefore it is not assignable (at least with a string).
You really haven't provided enough code to draw any real conclusions though. But looking at what you have posted, you definitely need to rethink your approach.
It's to say definitely but you are not doing the same. It looks like you are reading a ComboBox value the same (I will assume Combo64 is the same as 43 and 56) but my guess is that what you are assigning that value to is the problem:
Form_Applications.Application =
Application is not assignable. Is there another field you meant to use there?

Is it OK to have multiple assertions in a unit test when testing complex behavior?

Here is my specific scenario.
I have a class QueryQueue that wraps the QueryTask class within the ArcGIS API for Flex. This enables me to easily queue up multiple query tasks for execution. Calling QueryQueue.execute() iterate through all the tasks in my queue and call their execute method.
When all the results have been received and processed QueryQueue will dispatch the completed event. The interface to my class is very simple.
public interface IQueryQueue
{
function get inProgress():Boolean;
function get count():int;
function get completed():ISignal;
function get canceled():ISignal;
function add(query:Query, url:String, token:Object = null):void;
function cancel():void;
function execute():void;
}
For the QueryQueue.execute method to be considered successful several things must occur.
task.execute must be called on each query task once and only once
inProgress = true while the results are pending
inProgress = false when the results have been processed
completed is dispatched when the results have been processed
canceled is never called
The processing done within the queue correctly processes and packages the query results
What I am struggling with is breaking these tests into readable, logical, and maintainable tests.
Logically I am testing one state, that is the successful execution state. This would suggest one unit test that would assert #1 through #6 above are true.
[Test] public mustReturnQueryQueueEventArgsWithResultsAndNoErrorsWhenAllQueriesAreSuccessful:void
However, the name of the test is not informative as it does not describe all the things that must be true in order to be considered a passing test.
Reading up online (including here and at programmers.stackexchange.com) there is a sizable camp that asserts that unit tests should only have one assertion (as a guideline). As a result when a test fails you know exactly what failed (i.e. inProgress not set to true, completed displayed multiple times, etc.) You wind up with potentially a lot more (but in theory simpler and clearer) tests like so:
[Test] public mustInvokeExecuteForEachQueryTaskWhenQueueIsNotEmpty():void
[Test] public mustBeInProgressWhenResultsArePending():void
[Test] public mustNotInProgressWhenResultsAreProcessedAndSent:void
[Test] public mustDispatchTheCompletedEventWhenAllResultsProcessed():void
[Test] public mustNeverDispatchTheCanceledEventWhenNotCanceled():void
[Test] public mustReturnQueryQueueEventArgsWithResultsAndNoErrorsWhenAllQueriesAreSuccessful:void
// ... and so on
This could wind up with a lot of repeated code in the tests, but that could be minimized with appropriate setup and teardown methods.
While this question is similar to other questions I am looking for an answer for this specific scenario as I think it is a good representation of a complex unit testing scenario exhibiting multiple states and behaviors that need to be verified. Many of the other questions have, unfortunately, no examples or the examples do not demonstrate complex state and behavior.
In my opinion, and there will probably be many, there are a couple of things here:
If you must test so many things for one method, then it could mean your code might be doing too much in one single method (Single Responsibility Principle)
If you disagree with the above, then the next thing I would say is that what you are describing is more of an integration/acceptance test. Which allows for multiple asserts, and you have no problems there. But, keep in mind that this might need to be relegated to a separate section of tests if you are doing automated tests (safe versus unsafe tests)
And/Or, yes, the preferred method is to test each piece separately as that is what a unit test is. The closest thing I can suggest, and this is about your tolerance for writing code just to have perfect tests...Is to check an object against an object (so you would do one assert that essentially tests this all in one). However, the argument against this is that, yes it passes the one assert per test test, but you still lose expressiveness.
Ultimately, your goal should be to strive towards the ideal (one assert per unit test) by focusing on the SOLID principles, but ultimately you do need to get things done or else there is no real point in writing software (my opinion at least :)).
Let's focus on the tests you have identified first. All except the last one (mustReturnQueryQueueEventArgs...) are good ones and I could immediatelly tell what's being tested there (and that's very good sign, indicating they're descriptive and most likely simple).
The only problem is your last test. Note that extensive use of words "and", "with", "or" in test name usually rings problems bell. It's not very clear what it's supposed to do. Return correct results comes to mind first, but one might argue it's vague term? This holds true, it is vague. However you'll often find out that this is indeed pretty common requirement, described in details by method/operation contract.
In your particular case, I'd simplify last test to verify whether correct results are returned and that would be all. You tested states, events and stuff that lead to results building already, so there is no need to that again.
Now, advices in links you provided are quite good ones actually, and generally, I suggest sticking to them (single assertion for one test). The question is, what single assertion really stands for? 1 line of code at the end of test? Let's consider this simple example then:
// a method which updates two fields of our custom entity, MyEntity
public void Update(MyEntity entity)
{
entity.Name = "some name";
entity.Value = "some value";
}
This method contract is to perform those 2 operations. By success, we understand entity to be correctly updated. If one of them for some reasons fails, method as a unit is considered to fail. You can see where this is going; you'll either have two assertions or write custom comparer purely for testing purposes.
Don't be tricked by single assertion; it's not about lines of code or number of asserts (however, in majority of tests you'll write this will indeed map 1:1), but about asserting single unit (in the example above, update is considered to be an unit). And unit might be in reality multiple things that don't make any sense at all without eachother.
And this is exactly what one of questions you linked quotes (by Roy Osherove):
My guideline is usually that you test one logical CONCEPT per test. you can have multiple asserts on the same object. they will usually be the same concept being tested.
It's all about concept/responsibility; not the number of asserts.
I am not familiar with flex, but I think I have good experience in unit testing, so you have to know that unit test is a philosophy, so for the first answer, yes you can make a multiple assert but if you test the same behavior, the main point always in unit testing is to be very maintainable and simple code, otherwise the unit test will need unit test to test it! So my advice to you is, if you are new in unit testing, don't use multiple assert, but if you have good experience with unit testing, you will know when you will need to use them