different models performing very similarly(like almost the same) - deep-learning

while working on a nlp problem I decided to make few simple models with different architecture, but I realized that they are producing the same or very similar accuracies, what could be the reason for it? I am curious to know why this happened and if possible, a solution to it.
And can someone pls tell how to use from tensor_slices method
for this particular problem, coz I get running into some error whenever I try it.
[here is the link to the notebook](https://colab.research.google.com/drive/1Pbt3ga2Ptyav-UKDKY8s1AQWOfE3j4Rg?usp=sharing)
[model_0_simple_dense_layers](https://i.stack.imgur.com/u6a6W.png)
[model_1_CNN](https://i.stack.imgur.com/PyfNC.png)
[model_3_RNN](https://i.stack.imgur.com/M86FV.png)
quick summary:
`val_abuse_accuracy: 0.8037 - val_type_accuracy: 0.0368 - val_target_accuracy: 0.1074 - val_direction_accuracy: 0.9571
val_abuse_accuracy: 0.7945 - val_type_accuracy: 0.8988 - val_target_accuracy: 0.1196 - val_direction_accuracy: 0.9571
val_abuse_accuracy: 0.8067 - val_type_accuracy: 0.0123 - val_target_accuracy: 0.1074 - val_direction_accuracy: 0.9571`

Related

Best practice for interface design

I am wondering which version is the best one to implement.
The parameters are states that have 2 possible values.
This is an abstract example of the actual problem.
I am programming in a language that is procedural (without classes) and does not have typed variable.
I just read an article stating that version 1 is bad for readability and the caller. Personally I don't like version 2 either. Maybe there is a better option?
Version 1:
doSth(par1, par2)
Not redundant +
Single Method for a task +
More complex implementation -
Wrong parameters can be passed easily -
Version 2:
doSthWithPar1Is1AndPar2Is1()
doSthWithPar1Is1AndPar2Is2()
doSthWithPar1Is2AndPar2Is1()
doSthWithPar1Is2AndPar2Is2()
Redundant -
Too many methods (especially with more parameters) -
Long Method Names -
Simple implementation +
No parameters that could be passed wrong +
Given that you already have considered V1 feasible tells me, that the different argument value combinations have something in common with regards to how the values are to be processed.
In V2 you simply have to type and read more, which I'd say is the single most frequent reason for introducing errors/incorrectness and lose track of your requirements.
In V2 you have to repeat what is common in the individual implementations and if you make a mistake, the overall logic will be inconsistent at best. And if you want to fix it, you probably have to fix it in several places.
But, you can optimize code safety based on V1: choose a more "verbose" name for the procedure, like
doSomethingVerySpecificWithPar1OfTypeXAppliedToPar2OfTypeY(par1, par2)
(I am exaggerating a bit...) so you see immediately what you have originally intended.
You could even take the best out of V2 and introduce the individual functions, which simply redirect to the common function of V1 (so you avoid the redundancy). The gain in clarity almost always outweighs the slight loss of efficiency.
doSthWithPar1Is1AndPar2Is1()
{
doSomethingVerySpecificWithPar1OfTypeXAppliedToPar2OfTypeY(1, 1);
}
Always remember David Wheeler: "All problems in computer science can be solved by another level of indirection".
Btw: I don't consider long method names a problem but rather a benefit (up to a certain length of course).

Java supplier interface with org.powermock.reflect.exceptions.MethodNotFoundException

I have mock a https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/client/transport/TransportClient.java with Powermockito, before it works, but since the code chanage and add below code:
NetworkModule networkModule = new NetworkModule(settings, true, pluginsService.filterPlugins(NetworkPlugin.class), threadPool,
bigArrays, circuitBreakerService, namedWriteableRegistry, xContentRegistry, networkService);
final Transport transport = networkModule.getTransportSupplier().get();
the code alway fail at getTransportSupplier().get(), the throws exception:
Caused by: org.powermock.reflect.exceptions.MethodNotFoundException: No methods matching the name(s) get were found in the class hierarchy of class java.lang.Object.
at org.powermock.reflect.internal.WhiteboxImpl.getMethods(WhiteboxImpl.java:1720)
at org.powermock.reflect.internal.WhiteboxImpl.getMethods(WhiteboxImpl.java:1745)
at org.powermock.reflect.internal.WhiteboxImpl.getBestMethodCandidate(WhiteboxImpl.java:983)
at org.powermock.core.MockGateway$MockInvocation.findMethodToInvoke(MockGateway.java:317)
at org.powermock.core.MockGateway$MockInvocation.init(MockGateway.java:356)
at org.powermock.core.MockGateway$MockInvocation.<init>(MockGateway.java:307)
at org.powermock.core.MockGateway.doMethodCall(MockGateway.java:142)
at org.powermock.core.MockGateway.methodCall(MockGateway.java:125)
at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:162)
networkModule.getTransportSupplier() returns Supplier.
here is code from networkModule: https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/common/network/NetworkModule.java
any idea why?
I think this is a bug with Powermock.
See that entry in their issues database; which shows the same stack trace (although for some different target). But close enough.
So, chances are, this will be fixed at some point. Or not.
Long story, short rant: I have seen too many such problems with PowerMock; thus I decided at some point that I rather fix my broken designs; instead of using the PowerMock hammer. In my experience, when we talk about "our own code"; then we simply create easy-to-test code; and then we do not need PowerMock any more. Seriously, that works. When you create your own classes, and you need PowerMock to test that; then you did something wrong! And as soon as you stop using PowerMock; you stop spending time on problems that have nothing to do with your product.
Anyway, your option space is:
Subscribe to that defect; and try to get some attention to it; the more people go there and give "me too", the more likely something will happen.
Look into your design and have a closer look why you need PowerMock(ito). And then decide if you would benefit from eliminating this requirement by changing your production code.

Status of in-place `rfft` and `irfft` in Julia

So I'm doing some hobby-related stuff which involves taking Fourier transforms of large real arrays which barely fit in memory, and was curious to see if there was an in-place version of rfft and irfft that saved RAM, since RAM consumption is important to me. These transforms are possible despite the input-vs-output-type mismatch, and require an extra row of padding.
In Implement in-place rfft! and irfft!, Tim Holy said he was working on an in-place rfft! and irfft! that made use of a buffer-containing RCpair object, but then Steven Johnson said that he was implementing something equivalent using A_mul_B!(y, plan, x), which he elaborated on here.
Things get a little weird from then on. In the documentation for both 0.3.0 and 0.4.0 there is no mention of A_mul_B!, although A_mul_B is listed. But when I try entering them into Julia, I get
A_mul_B!
A_mul_B! (generic function with 28 methods)
A_mul_B
ERROR: A_mul_B not defined
which suggests that the situation is actually the opposite of what the documentation currently describes.
So since A_mul_B! seems to exist, but isn't documented anywhere, I tried to guess how to test it in-place as follows:
A = rand(Float32, 10, 10);
p = plan_rfft(A);
A_mul_B!(A,p,A)
which resulted in
ERROR: `A_mul_B!` has no method matching A_mul_B!(::Array{Float32,2}, ::Function, ::Array{Float32,2})
So...
Are in-place real FFTs still a work in progress? Or am I using A_mul_B! wrong?
Is there a mismatch between the 0.3.0 documentation and 0.3.0's function library?
That pull request from Steven Johnson is listed as open, not merged; that means the work hasn't been finished yet. The one from me is closed, but if you want the code you can grab it by clicking on the commits.
The docs indeed omit mention of A_mul_B!. A_mul_B is equivalent to A*B, and so isn't exported independently now. A_mul_B! would be used like this: instead of C = A*B, you could say A_mul_B!(C, A, B).
Can you please edit the docs to fix these issues? (You can edit files here in your webbrowser.)

Tesseract-Job: how to parse an image in order to get the information out of it

good moring.
first of all. This is the most impressive community i ever saw!
Well several days i mused about the three-folded job of
a. getting
b. parsing
c. storing a number of pages.
Two days ago i thought that getting the pages would be the major-task. No this isnt the case - i guess that the parser-job would be a heroic task. Each of the pages that are intended to be parsed is a png-image.
So the question is - after getting all them. How to parse them!? This seems to be the issue. Guess that there are some perl-modules out there - that can help in doing this...
Well - i think that this job only can be done with some OCR embedded! Question: is there a perl-module that can be use here to support this task:
BTW: see the result-pages.
BTW;: and as i thought i can find all 790 resultpages within a certain range between
Id= 0 and Id= 100000 i thought, that i can go the way with a loop:
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html
i thought i can go the Perl-Way but i am not very very sure:
I was trying to use LWP::UserAgent on the same URLs [see below]
with different query arguments, and i am wondering if LWP::UserAgent provides a
way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?
But - to be frank; The first task " GETTING all the pages is not very difficult - if we compare this task with the parsing... How can this be done!?
Any ideas - suggestions -
look forward to hear from you...
zero
You do not need a Perl module, you only need the system function.
system qw[ tesseract.exe foo.png foo.txt ];
my $text = read_file('foo.txt');
You may need to preprocess the images to help Tesseract, say using ImageMagick like:
system qw[ convert.exe -resize 200% image.jpg foo.png ];

Is hard-coding literals ever acceptable?

The code base I'm currently working on is littered with hard-coded values.
I view all hard coded values as a code smell and I try to eliminate them where possible...however there are some cases that I am unsure about.
Here are two examples that I can think of that make me wonder what the best practice is:
1. MyTextBox.Text = someCondition ? "Yes" : "No"
2. double myPercentage = myValue / 100;
In the first case, is the best thing to do to create a class that allows me to do MyHelper.Yes and MyHelper.No or perhaps something similar in a config file (though it isn't likely to change and who knows if there might ever be a case where its usage would be case sensitive).
In the second case, finding a percentage by dividing by 100 isn't likely to ever change unless the laws of mathematics change...but I still wonder if there is a better way.
Can anyone suggest an appropriate way to deal with this sort of hard coding? And can anyone think of any places where hard coding is an acceptable practice?
And can anyone think of any places where hard coding is an acceptable practice?
Small apps
Single man projects
Throw aways
Short living projects
For short anything that won't be maintained by others.
Gee I've just realized how much being maintainer coder hurt me in the past :)
The real question isn't about hard coding, but rather repetition. If you take the excellent advice found in "The Pragmatic Programmer", simply Don't Repeat Yourself (DRY).
Taking the principle of DRY, it is fine to hardcode something at any point. However, once you use that particular value again, refactor so this value is only hardcoded once.
Of course hard-coding is sometimes acceptable. Following dogma is rarely as useful a practice as using your brain.
(For an example of this, perhaps it's interesting to go back to the goto wars. How many programmers do you know that will swear by all things holy that goto is evil? Why then does Steve McConnell devote a dozen pages to a measured discussion of the subject in Code Complete?)
Sure, there's a lot of hard-gained experience that tells us that small throw-away applications often mutate into production code, but that's no reason for zealotry. The agilists tell us we should do the simplest thing that could possibly work and refactor when needed.
That's not to say that the "simplest thing" shouldn't be readable code. It may make perfect sense, even in a throw-away spike to write:
const MAX_CACHE_RECORDS = 50
foo = GetNewCache(MAX_CACHE_RECORDS)
This is regardless of the fact that in three iterations time, someone might ask for the number of cache records to be configurable, and you might end up refactoring the constant away.
Just remember, if you go to the extremes of stuff like
const ONE_HUNDRED = 100
const ONE_HUNDRED_AND_ONE = 101
we'll all come to The Daily WTF and laugh at you. :-)
Think! That's all.
It's never good and you just proved it...
double myPercentage = myValue / 100;
This is NOT percentage. What you wanted to write is :
double myPercentage = (myValue / 100) * 100;
Or more correctly :
double myPercentage = (myValue / myMaxValue) * 100;
But this hard coded 100 messed with your mind... So go for the getPercentage method that Colen suggested :)
double getpercentage(double myValue, double maxValue)
{
return (myValue / maxValue) * 100;
}
Also as ctacke suggested, in the first case you will be in a world of pain if you ever need to localize these literals. It's never too much trouble to add a couple more variables and/or functions
The first case will kill you if you ever need to localize. Moving it to some static or constant that is app-wide would at least make localizing it a little easier.
Case 1: When should you hard-code stuff: when you have no reason to think that it will ever change. That said, you should NEVER hard code stuff in-line. Take the time to make static variables or global variables or whatever your language gives you. Do them in the class in question, and if you notice that two classes or areas of your code share the same value FOR THE SAME REASON (meaning it's not just coincidence), point them to the same place.
Case 2: For case case 2, you're correct: the laws of "percentage" will not change (being reasonable, here), so you can hard code inline.
Case 3: The third case is where you think the thing could change but you don't want to/have time to bother loading ResourceBundles or XML or whatever. In that case, you use whatever centralizing mechanism you can -- the hated Singleton class is a good one -- and go with that until you actually have need to deal with the problem.
The third case is tricky, though: it's extraordinarily hard to internationalize an application without really doing it... so you will want to hard-code stuff and just hope that, when the i18n guys come knocking, your code is not the worst-tasting code around :)
Edit: Let me mention that I've just finished a refactoring project in which the prior developer had placed the MySql connect strings in 100+ places in the code (PHP). Sometimes they were uppercase, sometimes they were lower case, etc., so they were hard to search and replace (though Netbeans and PDT did help a lot). There are reasons why he/she did this (a project called POG basically forces this stupidity), but there is just nothing that seems less like good code than repeating the same thing in a million places.
The better way for your second example would be to define an inline function:
double getpercentage(double myValue)
{
return(myValue / 100);
}
...
double myPercentage = getpercentage(myValue);
That way it's a lot more obvious what you're doing.
Hardcoded literals should appear in unit tests for the test values, unless there is so much reuse of a value within a single test class that a local constant is useful.
The unit tests are a description of expected values without any abstraction or redirection.
Imagine yourself reading the test - you want the information literally in front of you.
The only time I use constants for test values is when many tests repeat a value (itself a bit suspicious) and the value may be subject to change.
I do use constants for things like names of test files to compare.
I don't think that your second is really an example of hardcoding. That's like having a Halve() method that takes in a value to use to divide by; doesn't make sense.
Beyond that, example 1, if you want to change the language for your app, you don't want to have to change the class, so it should absolutely be in a config.
Hard coding should be avoided like Dracula avoids the sun. It'll come back to bite you in the ass eventually.
"hardcoding" is the wrong thing to worry about. The point is not whether special values are in code or in config files, the point is:
If the value could ever change, how much work is that and how hard is it to find? Putting it in one place and referring to that place elsewhere is not much work and therefore a way to play it safe.
Will maintainance programmers definitely understand why the value is what it is? If there is any doubt whatsoever, use a named constant that explains the meaning.
Both of these goals can be achieved without any need for config files; in fact I'd avoid those if possible. "putting stuff in config files means it's easier to change" is a myth, unless either
you actually want to support customers changing the values themselves
no value that could possibly be put in the config file can cause a bug (buffer overflow, anyone?)
your build and deployment process sucks
The text for the conditions should be in a resource file; that's what it's there for.
Not normally (Are hard-coding literals acceptable)
Another way at looking at this is how using a good naming convention
for constants used in-place of hard coded literals provides additional
documentation in the program.
Even if the number is used only once, it can still be hard to recognized
and may even be hard to find for future changes.
IMHO, making programs easier to read should be second nature to a
seasoned software professional. Raw numbers rarely communicate
meaningfully.
The extra time taken to use a well named constant will make the
code readability (easy to recall to the mind) and useful for future
re-mining (code re-use).
I tend to view it in terms of the project's scope and size.
Some simple projects that I am a solo dev on? Sure, I hard code lots of things. Tools I write that only I will ever use? Sure, if it gets the job done.
But, in working on larger, team projects? I agree, they are suspect and usually the product of laziness. Tag them for review and see if you can spot a pattern where they can be abstracted away.
In your example, the text box should be localizable, so why not a class that handles that?
Remember that you WILL forget the meaning of any non-obvious hard-coded value.
So be certain to put a short comment after each to remind you.
A Delphi example:
Length := Length * 0.3048; { 0.3048 converts feet to meters }
no.
What is a simple throw away app today will be driving your entire enterprise tomorrow. Always use best practices or you'll regret it.
Code always evolves. When you initially write stuff hard coding is the easiest way to go. Later when a need arrives to change the value it can be improved. In some cases the need never comes.
The need can arrive in many forms:
The value is used in many places and it needs to be changed by a programmer. In this case a constant is clearly needed.
User needs to be able to change the value.
I don't see the need to avoid hard coding. I do see the need to change things when there is a clear need.
Totally separate issue is that of course the code needs to be readable and this means that there might be a need for a comment for the hard coded value.
For the first value, it really depends. If you don't anticipate any kind of wide-spread adoption of your application and internationalization will never be an issue, I think it's mostly fine. However, if you are writing some kind of open source software or something with a larger audience consider the fact that it may one day need to be translated. In that case, you may be better off using string resources.
It's okay as long as you don't do refactoring, unit-testing, peer code reviews. And, you don't want repeat customers. Who cares?
I once had a boss who refused to not hardcode something because in his mind it gave him full control over the software and the items related to the software. Problem was, when the hardware died that ran the software the server got renamed... meaning he had to find his code. That took a while. I simply found a hex editor and hacked around it instead of waiting.
I normally add a set of helper methods for strings and numbers.
For example when I have strings such as 'yes' and 'no' I have a function called __ so I call __('yes'); which starts out in the project by just returning the first parameter but when I need to do more complex stuff (such as internationaizaton) it's already there and the param can be used a key.
Another example is VAT (form of UK tax) in online shops, recently it changed from 17.5% to 15%. Any one who hard coded VAT by doing:
$vat = $price * 0.175;
had to then go through all references and change it to 0.15, instead the super usefull way of doing it would be to have a function or variable for VAT.
In my opinion anything that could change should be written in a changeable way. If I find myself doing the same thing more than 5 times in the same day then it becomes a function or a config var.
Hard coding should be banned forever. Althought in you very simple examples i don't see anything wrong using them in any kind of project.
In my opinion hard coding is when you believe that a variable/value/define etc. will never change and create all your code based on that belief.
Example of such hard coding is the book Teach Yourself C in 24 Hours that everybody should avoid.