Stanford NLP chunking for nested grammar - nltk

I made a switch from nltk to stanford-nlp and am finding trouble getting chunk results. A simple nested chunk grammar like
JJI:{(<JJ.*|VB.*>*)}
JJJ:{<RB.*>*(<TO>)?<JJI>?}
works like a charm in NLTK with easy code
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
whereas its overly complicated in Stanford-NLP
TokenSequencePattern.getMultiPatternMatcher failed to give results for nested grammer
I turned to CoreMapExpressionExtractor.createExtractorFromFiles which required extensive configurations with ENV.defaults["stage"] and action: result: properties and intensive debugging. After much struggle I turned here to SO.
Is there no simpler way/api in Stanford-NLP like nltk.RegexpParser to perform such an elementary task?
Also if CoreMapExpressionExtractor.createExtractorFromFiles is the only way forward could someone please advice me on how the NERRulesFile.rules files should look like for this usecase?

Related

For loop representation in Chisel (#Normalization in Float Adder)

I try to code floating adder;
https://github.com/ElectronNest/FPU/blob/master/FloatAdd.scala
This is half way.
The normalization is huge code part, so I would like to use for-loop or some equivalent representation method.
Is it possible to use loop or we need strict coding?
Best,
S.Takano
This is a very general and large question. The equivalent of a for loop in hardware can be implemented using a number of techniques, pretty much all of them involving registers to hold state information. Looking at your code I would suggest that you start a little smaller and work on syntax, I see many syntax errors currently. I use IntelliJ community edition as an editor because it does a great job with helping to get the code properly structured. I also would strongly recommend starting from the chisel-template repository. It has the proper layout and examples of a working circuit and unit testing harness. Then start with a smaller implementation that does something simple like just pass input to output and runs in a test harness, then slowly build up the circuit to achieve your goals.
Good luck!
Welcome and thank you for your interest in Chisel!
I would like to echo Chick's suggestion to start from something small that compiles and simulates and build up from there. In particular, the linked code above conflates some Scala vs. Chisel constructs (eg. Scala's if else, vs. Chisel's when, .elsewhen, .otherwise), as well as some Verilog vs. Chisel concepts (eg. bit indexing with [high:low] vs. Chisel's (high, low))
In case you haven't seen it, I would suggest taking a look at the Chisel Bootcamp which helps explain how to use constructs like for loops to generate hardware.
I'll also plug my own responses to this question on the chisel-users mailing list where I tried to explain some of the intuition behind writing Chisel generators, including differentiating if and when and using for loops.

Weka: Limitations on what one can output as source?

I was consulting several references to discover how I may output trained Weka models into Java source code so that I may use the classifiers I am training in actual code for research applications I have been developing.
As I was playing with Weka 3.7, I noticed that while it does output Java code to its main text buffer when use simpler classification (supervised in my case this time) methods such as J48 decision tree, it removes the option (rather, it voids it by removing the ability to checkmark it and fades the text) to output Java code for RandomTree and RandomForest (which are the ones that give me the best performance in my situation).
Note: I am clicking on the "More Options" button and checking "Output source code:".
Does Weka not allow you to output RandomTree or RandomForest as Java code? If so, why? Or if it does and just doesn't put it in the output buffer (since RF is multiple decision trees which I imagine it doesn't want to waste buffer space), how does one go digging up where in the file system Weka outputs java code by default?
Are there any tricks to get Weka to give me my trained RandomForest as Java code? Or is Serialization of the output *.model files my only hope when it comes to RF and RandomTree?
Thanks in advance to those who provide help.
NOTE: (As an addendum to the answer provided below) If you run across a similar situation (requiring you to use your trained classifier/ML model in your code), I recommend following the links posted in the answer that was provided in response to my question. If you do not specifically need the Java code for the RandomForest, as an example, de-serializing the model works quite nicely and fits into Java application code, fulfilling its task as a trained model/hardened algorithm meant to predict future unlabelled instances.
RandomTree and RandomForest can't be output as Java code. I'm not sure for the reasoning why, but they don't implement the "Sourceable" interface.
This explains a little about outputting a classifier as Java code: Link 1
This shows which classifiers can be output as Java code: Link 2
Unfortunately I think the easiest route will be Serialization, although, you could maybe try implementing "Sourceable" for other classifiers on your own.
Another, but perhaps inconvenient solution, would be to use Weka to build the classifier every time you use it. You wouldn't need to load the ".model" file, but you would need to load your training data and relearn the model. Here is a starters guide to building classifiers in your own java code http://weka.wikispaces.com/Use+WEKA+in+your+Java+code.
Solved the problem for myself by turning the output of WEKA's -printTrees option of the RandomForest classifier into Java source code.
http://pielot.org/2015/06/exporting-randomforest-models-to-java-source-code/
Since I am using classifiers with Android, all of the existing options had disadvantages:
shipping Android apps with serialized models didn't reliably work across devices
computing the model on the phone took too much resources
The final code will consist of three classes only: the class with the generated model + two classes to make the classification work.

Jenkins API tree filter using regex?

I'm using the tree parameter to filter JSON data coming back from the API and that works great. My issue is I need to fetch some specific data from an array with a bunch of stuff I don't care about. I'm wondering if there is a way, using the tree command, to filter using a regex or contains string?
For example, to give me back all fileNames that start with MyProject:
http://myapi.com?tree=fileName=MyProject*
Regular expressions are great for regular grammars.
Trees tend to follow context free grammars. You might do a lot better with a language that can support context aware operations, like XPath. Yes, a few very simple items might work without the extra features of XPath; however, once you do step on a use case that is beyond what is possible with regular grammars (they only support a small subset of what can be searched), it is literally impossible to accomplish the search with the tool in hand.
If you care to see how regular grammars tend to come with limitations, study the pumping lemma, and then think deeply about it's implications. A quick brush-up on parsing theory might also be useful. You are up against mathematics, including the parts of mathematics which include logical operations. It's not a matter of being a difficult problem to solve, it has been proven that regular expressions cannot match context free grammars.
If you are just more interested in getting the job done quickly. I suggest you start off by reading up on XPath and try to leverage one of the already available tools, or at least use it as a guide in your tree matching efforts.
I found that using not using the JSON and instead switching to XML you can filter with XPATH. An example on finding all job urls with name starting with "Test" is this:
https://{jenkins_instance_url}/view/All/api/xml?tree=jobs[name,url]&xpath=/*/job[starts-with(name,'Test')]/url&wrapper=jobs

Creating a language on the Rubinius VM

I'm looking to play around with the Rubinius VM to create a langauage, but just reading the documentation, I'm still quite lost on how to get started. Even looking at the projects, I still can't seem to figure out where the parsing and using the vm comes into place. Does anyone have any resources for this?
Hey I'm a contributor to the Fancy language that runs on rubinius. If you're interested in parsing take a look at boot/rbx-compiler there you'll find a Parser (implemented with KPEG) that basically constructs a tree of AST nodes, each of those nodes has a bytecode method that produces the rubinius vm instructions for everything to work. Fancy share a lot of semantics with ruby, so I guess starting with it would be easy if you're already familiar with ruby. You'll just need to checkout the examples/ dir to geet a feeling on the language and then the kpeg parser, ast nodes, loader, as you progress exploring the compiler. These days Fancy is bootstrapped (that means that the compiler has been written in fancy itself - at lib/compiler) but rbx-compiler is the first step in that process.
Hope exploring Fancy's source code can be of help to you.
In case you hadn't seen it, check out Evan's keynote from 2011 LA Ruby Conf. He shows how to build a simple language, which might be helpful.

Why don't I see pipe operators in most high-level languages?

In Unix shell programming the pipe operator is an extremely powerful tool. With a small set of core utilities, a systems language (like C) and a scripting language (like Python) you can construct extremely compact and powerful shell scripts, that are automatically parallelized by the operating system.
Obviously this is a very powerful programming paradigm, but I haven't seen pipes as first class abstractions in any language other than a shell script. The code needed to replicate the functionality of scripts using pipes seems to always be quite complex.
So my question is why don't I see something similar to Unix pipes in modern high-level languages like C#, Java, etc.? Are there languages (other than shell scripts) which do support first class pipes? Isn't it a convenient and safe way to express concurrent algorithms?
Just in case someone brings it up, I looked at the F# pipe-forward operator (forward pipe operator), and it looks more like a function application operator. It applies a function to data, rather than connecting two streams together, as far as I can tell, but I am open to corrections.
Postscript: While doing some research on implementing coroutines, I realize that there are certain parallels. In a blog post Martin Wolf describes a similar problem to mine but in terms of coroutines instead of pipes.
Haha! Thanks to my Google-fu, I have found an SO answer that may interest you. Basically, the answer is going against the "don't overload operators unless you really have to" argument by overloading the bitwise-OR operator to provide shell-like piping, resulting in Python code like this:
for i in xrange(2,100) | sieve(2) | sieve(3) | sieve(5) | sieve(7):
print i
What it does, conceptually, is pipe the list of numbers from 2 to 99 (xrange(2, 100)) through a sieve function that removes multiples of a given number (first 2, then 3, then 5, then 7). This is the start of a prime-number generator, though generating prime numbers this way is a rather bad idea. But we can do more:
for i in xrange(2,100) | strify() | startswith(5):
print i
This generates the range, then converts all of them from numbers to strings, and then filters out anything that doesn't start with 5.
The post shows a basic parent class that allows you to overload two methods, map and filter, to describe the behavior of your pipe. So strify() uses the map method to convert everything to a string, while sieve() uses the filter method to weed out things that aren't multiples of the number.
It's quite clever, though perhaps that means it's not very Pythonic, but it demonstrates what you are after and a technique to get it that can probably be applied easily to other languages.
You can do pipelining type parallelism quite easily in Erlang. Below is a shameless copy/paste from my blogpost of Jan 2008.
Also, Glasgow Parallel Haskell allows for parallel function composition, which amounts to the same thing, giving you implicit parallelisation.
You already think in terms of
pipelines - how about "gzcat
foo.tar.gz | tar xf -"? You may not
have known it, but the shell is
running the unzip and untar in
parallel - the stdin read in tar just
blocks until data is sent to stdout by
gzcat.
Well a lot of tasks can be expressed
in terms of pipelines, and if you can
do that then getting some level of
parallelisation is simple with David
King's helper code (even across erlang
nodes, ie. machines):
pipeline:run([pipeline:generator(BigList),
{filter,fun some_filter/1},
{map,fun_some_map/1},
{generic,fun some_complex_function/2},
fun some_more_complicated_function/1,
fun pipeline:collect/1]).
So basically what he's doing here is
making a list of the steps - each step
being implemented in a fun that
accepts as input whatever the previous
step outputs (the funs can even be
defined inline of course). Go check
out David's blog entry for the
code and more detailed explanation.
magrittr package provides something similar to F#'s pipe-forward operator in R:
rnorm(100) %>% abs %>% mean
Combined with dplyr package, it brings a neat data manipulation tool:
iris %>%
filter(Species == "virginica") %>%
select(-Species) %>%
colMeans
You can find something like pipes in C# and Java, for example, where you take a connection stream and put it inside the constructor of another connection stream.
So, you have in Java:
new BufferedReader(new InputStreamReader(System.in));
You may want to look up chaining input streams or output streams.
Thanks to all of the great answers and comments, here is a summary of what I learned:
It turns out that there is an entire paradigm related to what I am interested in called Flow-based programming. A good example of a language designed specially for flow-based programming is Hartmann pipelines. Hartamnn pipelines generalize the idea of streams and pipes used in Unix and other OS's, to allows for multiple input and output streams (rather than just a single input stream, and two output streams). Erlang contains powerful abstractions that make it easy to express concurrent processes in a manner which resembles pipes. Java provides PipedInputStream and PipedOutputStream which can be used with threads to achieve the same kind of abstractions in a more verbose manner.
I think the most fundamental reason is because C# and Java tend to be used to build more monolithic systems. Culturally, it's just not common to even want to do pipe-like things -- you just make your application implement the necessary functionality. The notion of building a multitude of simple tools and then gluing them together in arbitrary ways just isn't common in those contexts.
If you look at some of the scripting languages, like Python and Ruby, there are some pretty good tools for doing pipe-like things from within those scripts. Check out the Python subprocess module, for example, which allows you to do things like:
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', stdout_value
Are you looking at the F# |> operator? I think you actually want the >> operator.
Usually you just don't need it and programs run faster without it.
Basically piping is consumer/producer pattern. And it's not that hard to write those consumers and producers because they don't share much data.
Piping for Python : pypes
Mozart-OZ can do pipes using ports and threads.
Objective-C has the NSPipe class. I use it quite frequently.
I've had a lot of fun building pipeline functions in Python. I have a library I wrote, I put the contents and a sample run here. The best fit me for has been XML processing, described in this Wikipedia article.
You can do pipe like operations in Java by chaining/filtering/transforming iterators.
You can use Google's Guava Iterators.
I will say even with the very helpful guava library and static imports its still ends up being lots of Java code.
In Scala its quite easy to make your own pipe operator.
Streaming libraries based on coroutines have existed in Haskell for quite some time now. Two popular examples are conduit and pipes.
Both libraries are well-written and well-documented, and are relatively mature. The Yesod web framework is based on conduit, and it's pretty damn fast. Yesod is competitive with Node on performance, even beating it in a few places.
Interestingly, all of the these libraries are single-threaded by default. This is because the single motivating use case for pipelines is servers, which are I/O bound.
Since R added pipe operator today, it's worth to mention Julialang has pipe all a long:
help?> |>
search: |>
|>(x, f)
Applies a function to the preceding argument. This allows for easy function chaining.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> [1:5;] |> x->x.^2 |> sum |> inv
0.01818181818181818
if you're still interested in an answer...
you can look at factor, or the older joy and forth for the concatenative paradigm.
in arguments and out arguments are implicit, dumped to a stack. then the next word (function) takes that data and does something with it.
the syntax is postfix.
"123" print
where print takes one argument, whatever is in the stack.
You can use my library in python: github.com/sspipe/sspipe
In Mathematica, you can use //
for example
f[g[h[x,parm1],parm2]]
quite a mess.
could be written as
x // h[#, parm1]& // g[#, parm2]& // f
the # and & is lambda in Mathematica
In js, there seems to be pipe operator |> soon.
https://github.com/tc39/proposal-pipeline-operator