I want to compare two smoothing methods for a bigram model:
Add-one smoothing
Interpolated Absolute Discounting
For the first method, I found some codes.
def calculate_bigram_probabilty(self, previous_word, word):
bigram_word_probability_numerator = self.bigram_frequencies.get((previous_word, word), 0)
bigram_word_probability_denominator = self.unigram_frequencies.get(previous_word, 0)
if self.smoothing:
bigram_word_probability_numerator += 1
bigram_word_probability_denominator += self.unique__bigram_words
return 0.0 if bigram_word_probability_numerator == 0 or bigram_word_probability_denominator == 0 else float(
bigram_word_probability_numerator) / float(bigram_word_probability_denominator)
However, I found nothing for the second method except for some references for 'KneserNeyProbDist'. However, this is for trigrams!
How can I change my code above to calculate it? The parameters of this method must be estimated from a development-set.
In this answer I just clear up a few things that I just found about your problem, but I can't provide a coded solution.
with KneserNeyProbDist you seem to refer to a python implementation of that problem: https://kite.com/python/docs/nltk.probability.KneserNeyProbDist
There exists an article about Kneser–Ney smoothing on wikipedia: https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing
The article above links this tutorial: https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf but this has a small fault on the most important page 29, the clear text is this:
Modified Kneser-Ney
Chen and Goodman introduced modified Kneser-Ney:
Interpolation is used instead of backoff. Uses a separate discount for one- and two-counts instead of a single discount for all counts. Estimates discounts on held-out data instead of using a formula
based on training counts.
Experiments show all three modifications improve performance.
Modified Kneser-Ney consistently had best performance.
Regrettable the modified Version is not explained in that document.
The original documentation by Chen & Goodman luckily is available, the Modified Kneser–Ney smoothing is explained on page 370 of this document: http://u.cs.biu.ac.il/~yogo/courses/mt2013/papers/chen-goodman-99.pdf.
I copy the most important text and formula here as screenshot:
So the Modified Kneser–Ney smoothing now is known and seems being the best solution, just translating the description beside formula in running code is still one step to do.
It might be helpful that below the shown text (above in screenshot) in the original linked document is still some explanation that might help to understand the raw description.
I'm here with a new question.
I'm making a custom algorithm that need precomputed data for the graph edges. I use the AllEdgesIterator like this :
AllEdgesIterator it = graph.getAllEdges();
int nbEdges = it.getCount();
int count = 0;
int[] myData = new int[nbEdges];
while (it.next())
{
count++;
...
}
The first weird thing is that nbEdges is equal to 15565 edges but count is only equal to 14417. How is it possible ?
The second weird thing is when I run my custom A* : I simply browse nodes using the outEdgeExplorer but I get an IndexOutOfBound at index 15569 on myData array. I thought that the edge indexes were included in [0 ; N-1] where N is the number of edges, is it really the case ?
What could be happening here ? By the way, I have disabled graph contraction hierarchies.
Thank you for answering so fast every time !
The first weird thing is that nbEdges is equal to 15565 edges
but count is only equal to 14417. How is it possible ?
This is because of the 'compaction' where unreachable subnetworks are removed, but currently only nodes are removed from the graph the edges are just disconnected and stay in the edges-'array' marked as deleted. So iter.getCount is just an upper limit but the AllEdgeIterator excludes such unused edges correctly when iterating and has the correct count. But using iter.getCount to allocate your custom data array is the correct thing to do.
Regarding the second question: that is probably because the QueryGraph introduces new virtual edges with a bigger edgeId as iter.getCount. Depending on the exact scenario there are different solutions like just excluding or using the original edge instead etc
How can I define a abstract odd function, say f[x].
Whenever f[x]+f[-x] appears, mathematica simplifies it to zero.
This can be done easily using upvalues
f[x_] + f[y_] /; x == -y ^:= 0
Normally Mathematica would try to assign the above rule to Plus, which of course does not work since that's protected. By using ^:= instead of := you can assign the rule to f. A quick check yields:
In[2]:= f[3]+f[-3]
Out[2]:= 0
Edit: This, however, only works for Plus. It's probably better to use something more general, like:
f[x_?Negative] := -f[-x]
Now this also works with things like
In[4]:= -f[3] - f[-3]
Out[4]:= 0
If you also want the function to work symbolically, you could add something like:
f[-a_] := -f[a]
I am not good at this, but how about using the TransformationFunctions of Simplify ?
For example, suppose you have the expression 2 Sin[x] + f[x] + 3 + f[-x] + g[x] + g[-x] and you want to simplify it, assuming f[x] is odd function and g[x] is even. Then we want a rule to say f[x]+f[-x]->0 and a rule g[x]+g[-x]->2 g[x].
Hence write
myRules[e_]:=e/.f[x]+f[-x]->0/.g[x]+g[-x]->2 g[x]
Simplify[2 Sin[x]+ f[x]+ 3 +f[-x]+ g[x] + g[-x],
TransformationFunctions->{Automatic,myRules}]
and this gives
3+2 g[x]+2 Sin[x]
Btw, in the above, I am using f[x] where it really should be a pattern f[x_], so that expression such as f[anything]+f[-anything] will also become zero. So, this needs to be improved to make myRules more general. Now it only works for the exact expression f[x]. I am not sure now how to improve this. Might need a delayed rule or so. Will think about it more. But you get the idea I hope.
Inspired by Andre michelle, I`m building a tone matrix in AS3.
I managed to create the matrix and generate the different sounds. They don´t sound that good, but I´m getting there
One big problem I have is when more than one dot is set to play, it sounds just horrible. I googled a lot and found the additive synthesis method but don´t have a clue how to apply it to as3.
anybody out there knows how to play multiple sounds together? any hint?
my demo is at www.inklink.co.at/tonematrix
Oh common the sound was horrible...
Checked wiki? It is not that hard to understand... Even if you don't know that much of mathematics... Which you should - PROGRAMMING music is not easy.
So:
Let's first define something:
var harmonics:Array = new Array();
harmonics is the array in which we will store individual harmonics. Each child will be another array, containing ["amplitude"] (technically the volume), ["frequency"] and ["wavelength"] (period). We also need a function that can give us the phase of the wave given the amplitude, wavelength and offset (from the beginning of the wave). For square wave something like:
function getSquarePhase(amp:Number, wl:Number, off:Number):Number {
while (off > wl){off -= wl;}
return (off > wl / 2 ? -amp : amp); // Return amp in first half, -amp in 2.
}
You might add other types, or even custom vector waves if you want.
Now for the harder part.
var samplingFrequency; // set this to your SF
function getAddSyn(harmonics:Array, time:Number):Number {
if (harmonics.length == 1){ // We do not need to perform AS here
return getSquarePhase(harmonics[0]["amplitude"], harmonics[0]["wavelength"], time);
} else {
var hs:Number = 0;
hs += 0.5 * (harmonics[0]["amplitude"] * Math.cos(getSquarePhase(harmonics[0]["amplitude"], harmonics[0]["wavelength"], time)));
// ^ You can try to remove the line above if it does not sound right.
for (var i:int = 1; i < harmonics.length; i++){
hs += (harmonics[0]["amplitude"] * Math.cos(getSquarePhase(harmonics[0]["amplitude"], harmonics[0]["wavelength"], time)) * Math.cos((Math.PI * 2 * harmonics[0]["frequency"] / samplingFrequency) * time);
hs -= Math.sin(getSquarePhase(harmonics[0]["amplitude"], harmonics[0]["wavelength"], time)) * Math.sin((Math.PI * 2 * harmonics[0]["frequency"] / samplingFrequency) * time);
}
return hs;
}
}
This is all just converted (weakly :D) from the Wikipedia, I may have done a mistake somewhere in there... But I think you should get the idea... And if not, try to convert the AS from Wikipedia yourself, as I said, it is not so hard.
I also somehow ignored the Nyquist frequency...
I have tried your demo and thought it sounded pretty good actually. What do you mean it doesn't sound that good? What's missing? My main area of interest is music and I haven't found anything wrong , only it's a little frustrating , because after creating a sequence, I feel the need to add new sounds! Had I been able to record what I was playing with, I would have sent it to you.
Going into additive synthesis doesn't look like a light undertaking though. How far do you want to push it, would you want to create some form of synthesizer?
In the case of languages that support single decision and action without brackets, such as the following example:
if (var == true)
doSomething();
What is the preferred way of writing this? Should brackets always be used, or should their usage be left as a preference of the individual developer? Additionally, does this practice depend on the size of the code block, such as in the following example:
if (var == 1)
doSomething(1);
else if (var > 1 && var < 10)
doSomething(2);
else
{
validate(var);
doSomething(var);
}
There isn't really a right answer. This is what coding standards within the company are for. If you can keep it consistent across the whole company then it will be easy to read. I personally like
if ( a == b) {
doSomething();
}
else {
doSomething();
}
but this is a holy war.
I recommend
if(a==b)
{
doSomething();
}
because I find it far easier to do it up-front than to try to remember to add the braces when I add a second statement to to success condition...
if(a==b)
doSomething();
doSomethingElse();
is very different to
if(a==b)
{
doSomething();
doSomethingElse();
}
see Joel's article for further details
I tend to use braces at all times. You can get some subtle bugs where you started off with something like:
if(something)
DoOneThing();
else
DoItDifferently();
and then decide to add another operation to the else clause and forget to wrap it in braces:
if(something)
DoOneThing();
else
DoItDifferently();
AlwaysGetsCalled();
AlwaysGetsCalled() will always get called, and if you're sitting there at 3am wondering why your code is behaving all strange, something like that could elude you for quite some time. For this reason alone, I always use braces.
My preference is to be consistent, e.g., if you use brackets on one block, use brackets all throughout even with just one statement:
if (cond1)
{
SomeOperation();
Another();
}
elseif (cond2)
{
DoSomething();
}
else
{
DoNothing();
DoAnother();
}
But if you have just a bunch of one liners:
if (cond1)
DoFirst();
elseif (cond2)
DoSecond();
else
DoElse();
Looks cleaner (if you don't mind the dummy method names ;) that way, but that's just me.
This also applies to loop constructs and the like:
foreach (var s as Something)
if (s == someCondition)
yield return SomeMethod(s);
You should also consider that this is a convention that might be more suited to .NET (notice that Java peepz like to have their first curly brace in the same line as the if).
Chalk this one to lack of experience, but during my seven-year stint as a code monkey I've never actually seen anyone make the mistake of not adding braces when adding code to a block that doesn't have braces. That's precisely zero times.
And before the wisecrackers get to it, no, the reason wasn't "everyone always uses braces".
So, an honest question -- I really would like to get actual replies instead of just downvotes: does that ever actually happen?
(Edit: I've heard enough outsourcing horror stories to clarify a bit: does it ever actually happen to competent programmers?)
It doesn't really matter, as long as you're consistent with it.
There does seem to be a tendency to demand sameness within a single statement, i.e. if there's brackets in one branch, there's brackets everywhere. The Linux kernel coding standards, for one, mandate that.
I would strongly advocate always using braces, even when they're optional. Why? Take this chunk of C++ code:
if (var == 1)
doSomething();
doSomethingElse();
Now, someone comes along who isn't really paying enough attention and decides that something extra needs to happen if (var == 1), so they do this:
if (var == 1)
doSomething();
doSomethingExtra();
doSomethingElse();
It's all still beautifully indented but it won't do what was intended.
By always using braces, you're more likely to avoid this sort of bug.
I personnally side with McConnell's explanation from Code Complete.
Use them whenever you can. They enhance your code's readability and remove the few and scarce confusions that might occur.
There is one thing that's more important though....Consistency. Which ever style you use,make sure you always do it the same way.
Start writing stuff like:
If A == true
FunctA();
If B == "Test"
{
FunctB();
}
You are bound to end up looking for an odd bug where the compiler won't understand what you were trying to do and that will be hard to find.
Basically find the one you are comfortable writing everytime and stick to it. I do believe in using the block delimeters('{', '}') as much as possible is the way to go.
I don't want to start a question inside another, but there is something related to this that I want to mention to get your mental juices going. One the decision of using the brackets has been made. Where do you put the opening bracket? On the same line as the statement or underneath. Indented brackets or not?
If A == false {
//calls and whatnot
}
//or
If B == "BlaBla"
{
//calls and whatnot
}
//or
If C == B
{
//calls and whatnot
}
Please don't answer to this since this would be a new question. If I see an interest in this I will open a new question your input.
I've always used brackets at all times except for the case where I'm checking a variable for NULL before freeing it, like is necessary in C
In that case, I make sure it's clear that it's a single statement by keeping everything on one line, like this:
if (aString) free(aString);
There is no right or wrong way to write the above statement. There are plenty of accepted coding styles. However, for me, I prefer keeping the coding style consist throughout the entire project. ie. If the project is using K&R style, you should use K&R.
Ruby nicely obviates one issue in the discussion. The standard for a one-liner is:
do_something if (a == b)
and for a multi-line:
if (a == b)
do_something
do_something_else
end
This allows concise one-line statements, but it forces you to reorganize the statement if you go from single- to multi-line.
This is not (yet) available in Java, nor in many other languages, AFAIK.
As others have mentioned, doing an if statement in two lines without braces can lead to confusion:
if (a == b)
DoSomething();
DoSomethingElse(); <-- outside if statement
so I place it on a single line if I can do so without hurting readability:
if (a == b) DoSomething();
and at all other times I use braces.
Ternary operators are a little different. Most of the time I do them on one line:
var c = (a == b) ? DoSomething() : DoSomethingElse();
but sometimes the statements have nested function calls, or lambda expressions which
make a one-line statement difficult to parse visually, so I prefer something like this:
var c = (a == b)
? AReallyReallyLongFunctionName()
: AnotherReallyReallyLongFunctionOrStatement();
Still more concise than an if/else block but easy to see what's going on.
Sun's Code Conventions for the Java programming Language has this to say:
The if-else class of statements should
have the following form:
if (condition) {
statements;
}
if (condition) {
statements;
} else {
statements;
}
if (condition) {
statements;
} else if (condition) {
statements;
} else {
statements;
}
Our boss makes us put { } after a decision statement no matter what, even if it's a single statement. It's really annoying to add two extra lines. The only exception is ternary operators.
I guess it's a good thing I have my code monitor in portrait orientation at 1200x1600.
I prefer
if (cond)
{
//statement
}
even with only a single statement. If you were going to write something once, had no doubts that it worked, and never planned on another coder ever looking at that code, go ahead and use whatever format you want. But, what does the extra bracketing really cost you? Less time in the course of a year than it takes to type up this post.
Yes, I like to indent my brackets to the level of the block, too.
Python is nice in that the indentation defines the block. The question is moot in a language like that.
I tend to agree with Joel Spolsky on that one with that article (Making Wrong Code Look Wrong) with the following code example :
if (i != 0)
bar(i);
foo(i);
Foo is now unconditionnal. Wich is real bad!
I always use brackets for decision statements. It helps code maintainability and it makes the code less bug prone.
I use curly braces around every statement if and only if at least one of them requires it.
In Perl if you are doing a simple test, sometime you will write it in this form:
do_something if condition;
do_something unless condition;
Which can be really useful to check the arguments at the start of a subroutine.
sub test{
my($self,#args) = #_;
return undef unless defined $self;
# rest of code goes here
}
The golden rule is that, when working in an existing project, follow those coding standards.
When I'm at home, I have two forms.
The first is the single line:
if (condition) doThis();
and the second is for multiple lines:
if (condition) {
doThis();
}
I used to follow the "use curly braces always" line like an apparatchik. However, I've modified my style to allow for omitting them on single line conditional expressions:
if(!ok)return;
For any multistatement scenario though I'm still of the opinion that braces should be mandatory:
if(!ok){
do();
that();
thing();
}