NLTK Clause and Phrase breakdowns - nltk

Is there a way to get NLTK to return text fully marked with all Treebank clause and Treebank phrase demarcations (or equivalent; it need not be Treebank)? I need to be able to return both clauses and phrases (separately). The only thing on this that I have found is in the NLTK Bird/Klein/Loper book in chapter 7 where it says you can not process for noun phrases and verb phrases at the same time, but I want to do much more than that! I think the Stanford POS parser does this but the client wants to use only the NLTK. Thanks.

Have you looked at chapter 8 yet? It sounds like you want something like:
>>> from nltk.corpus import treebank
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0]
>>> print t
(S
(NP-SBJ
(NP (NNP Pierre) (NNP Vinken))
(, ,)
(ADJP (NP (CD 61) (NNS years)) (JJ old))
(, ,))
(VP
(MD will)
(VP
(VB join)
(NP (DT the) (NN board))
(PP-CLR
(IN as)
(NP (DT a) (JJ nonexecutive) (NN director)))
(NP-TMP (NNP Nov.) (CD 29))))
(. .))
in addition to the chunking resources that you have already found. But if you mean that you want to parse text you supply, there are also options like:
>>> sr_parse = nltk.ShiftReduceParser(grammar1)
>>> sent = 'Mary saw a dog'.split()
>>> print sr_parse.parse(sent)
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))
but this relies on grammar1 being populated manually beforehand. Chunking is easier than parsing.

Related

How to generate a parse tree using NLTK?

I am trying to generate a tree like this:
I am not able to find any relevant information in regard to it. Please help.
Parse
(ROOT
(S
(NP (PRP$ My) (NN dog))
(ADVP (RB also))
(VP (VBZ likes)
(S
(VP (VBG eating)
(NP (NN sausage)))))
(. .)))
Thanks.
The NLTK comes with a number of parsers based on CFG and other grammar formalisms, but they are teaching tools of very little practical use: They can only handle a tiny subset of English syntax. (If this is what you are after, your question is a duplicate of this SO question.)
To parse ordinary English text with the nltk, you'll need to install a third-party parser that the nltk knows how to interface with. Your best bet is probably the Stanford Parser, as you probably already knew since you tagged your question stanford-nlp. You'll need the latest version of the nltk (or version 3.1 at least, but later is better.) The abovementioned SO question has some other suggestions in the answers; no idea if they are any good.
you can use StanfordCoreNLP to achieve that
download :
pip install pycorenlp
start your server in this (stanford-corenlp-full-2018-01-31) directory with this command -
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
-port 9000 -timeout 15000
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
output = nlp.annotate(textInput, properties={
'annotators': 'parse',
'outputFormat': 'json',
'timeout': 1000,
})
print(output['sentences'][0]["parse"])
sample input :
Is there any way to associate spotify music with a specific ambient so when I say to Siri Start Beach Ambient
output :
(ROOT
(SQ (VBZ Is)
(NP (EX there))
(NP
(NP (DT any) (NN way))
(S
(VP (TO to)
(VP (VB associate)
(NP (JJ spotify) (NN music))
(PP (IN with)
(NP
(NP (DT a) (JJ specific))
(ADJP (JJ ambient) (RB so)
(SBAR
(WHADVP (WRB when))
(S
(NP (PRP I))
(VP (VBP say)
(PP (TO to)
(NP (NNP Siri) (NNP Start) (NNP Beach) (NNP Ambient)))))))))))))
(. .)))
hope this may help.

emacs-org mode and html publishing: how to change structure of generated HTML

I'm beginner to this so sorry if I overlook something simple...
I'd like to use emacs org-mode for my HTML pages. The 'default' setup is nice and working, however I'd like to use one of the free web templates, e.g. http://www.freecsstemplates.org/preview/goodlife/
These templates provide CSS files, however just usage of CSS in org-mode's HTML export seem not to be enough. It seems that to use these templates correctly I need as well to maintain HTML structure as shown in such template.
How can I force org-mode to generate HTML structure I like (i.e. frame division)?
It seems, that some options are offered by 'org-export-generic.el'. Even if I would persuade generic export to provide me with a single HTML page, it still does not resolve completely the HTML export....
This section of the org-mode manual provides some guidance on exporting to html and using css http://orgmode.org/manual/CSS-support.html#CSS-support This includes a description of the default classes org-mode uses so you could modify your CSS.
If you want to modify org mode exports to match your CSS classes and ids use the :HTML_CONTAINER_CLASS: property in an org headline and the :CUSTOM_ID: property for creating ids.
Instead of setting things up per file I use org mode's publishing ability to output many org files into a single website. You can find a tutorial on that here http://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html
My org-publish-project-alist looks like:
'(org-publish-project-alist (quote (("requirements" :components ("req-static" "req-org"))
("req-static" :base-directory "~/org/requirements" :publishing-directory "~/public_html/requirements/" :base-extension "gif\\|css" :publishing-function org-publish-attachment)
("req-org" :base-directory "~/org/requirements/" :publishing-directory "~/public_html/requirements/" :style "<link rel=\"stylesheet\" type=\"text/css\" href=\"./style.css\" />" :section-numbers nil :headline-levels 3 :table-of-contents 2 :auto-sitemap t :sitemap-filename "index.org" :sitemap-title "Requirements for My Software" :link-home "./index.html"))
I agree. The HTML generated by org's built-in export is good but not quite what I'd want. It appears that the generic export are based on elisp, whereas I prefer XSLT.
I wrote the following code for turning an org file into XML, but I haven't written the publishing transforms yet. Anyway, this may be helpful for your reference, especially as it shows the structure of an org document's internal representation.
(require 'org-element)
(defvar xml-content-encode-map
'((?& . "&")
(?< . "<")
(?> . ">")))
(defvar xml-attribute-encode-map
(cons '(?\" . """) xml-content-encode-map))
(defun write-xml (o out parents depth)
"Writes O as XML to OUT, assuming that lists have a plist as
their second element (for representing attributes). Skips basic
cycles (elements pointing to ancestor), and compound values for
attributes."
(if (not (listp o))
;; TODO: this expression is repeated below
(princ o (lambda (charcode)
(princ
(or (aget xml-content-encode-map charcode)
(char-to-string charcode))
out)))
(unless (member o parents)
(let ((parents-and-self (cons o parents))
(attributes (second o)))
(dotimes (x depth) (princ "\t" out))
(princ "<" out)
(princ (car o) out)
(loop for x on attributes by 'cddr do
(let ((key (first x))
(value (second x)))
(when (and value (not (listp value)))
(princ " " out)
(princ (substring (symbol-name key) 1) out)
(princ "=\"" out)
(princ value (lambda (charcode)
(princ
(or (aget xml-attribute-encode-map charcode)
(char-to-string charcode))
out)))
(princ "\"" out))))
(princ ">\n" out)
(loop for e in (cddr o) do
(write-xml e out parents-and-self (+ 1 depth)))
(dotimes (x depth) (princ "\t" out))
(princ "</" out)
(princ (car o) out)
(princ ">\n" out)))))
(defun org-file-to-xml (orgfile xmlfile)
"Serialize ORGFILE file as XML to XMLFILE."
(save-excursion
(find-file orgfile)
(let ((org-doc (org-element-parse-buffer)))
(with-temp-file xmlfile
(let ((buffer (current-buffer)))
(princ "<?xml version='1.0'?>\n" buffer)
(write-xml org-doc buffer () 0)
(nxml-mode)))))
(find-file xmlfile)
(nxml-mode))
(defun org-to-xml ()
"Export the current org file to XML and open in new buffer.
Does nothing if the current buffer is not in org-mode."
(interactive)
(when (eq major-mode 'org-mode)
(org-file-to-xml
(buffer-file-name)
(concat (buffer-file-name) ".xml"))))

haskell word searching program development

hello I am making some word searching program
for example
when "text.txt" file contains "foo foos foor fo.. foo fool"
and search "foo"
then only number 2 printed
and search again and again
but I am haskell beginner
my code is here
:module +Text.Regex.Posix
putStrLn "type text file"
filepath <- getLine
data <- readFile filepath
--1. this makes <interactive>:1:1: parse error on input `data' how to fix it?
parsedData =~ "[^- \".,\n]+" :: [[String]]
--2. I want to make function and call it again and again
searchingFunc = do putStrLn "search for ..."
search <- getLine
result <- map (\each -> if each == search then count = count + 1) data
putStrLn result
searchingFunc
}
sorry for very very poor code
my development environment is Windows XP SP3 WinGhci 1.0.2
I started the haskell several hours ago sorry
thank you very much for reading!
edit: here's original scheme code
thanks!
#lang scheme/gui
(define count 0)
(define (search str)
(set! count 0)
(map (λ (each) (when (equal? str each) (set! count (+ count 1)))) data)
(send msg set-label (format "~a Found" count)))
(define path (get-file))
(define port (open-input-file path))
(define data '())
(define (loop [line (read-line port)])
(when (not (eof-object? line))
(set! data (append data
(regexp-match* #rx"[^- \".,\n]+" line)))
(loop)))
(loop)
(define (cb-txt t e) (search (send t get-value)))
(define f (new frame% (label "text search") (min-width 300)))
(define txt (new text-field% (label "type here to search") (parent f) (callback (λ (t e) (cb-txt t e)))))
(define msg (new message% (label "0Found ") (parent f)))
(send f show #t)
I should start by iterating what everyone would (and should) say: Start with a book like Real World Haskell! That said, I'll post a quick walkthrough of code that compiles, and hopefully does something close to what you originally intended. Comments are inline, and hopefully should illustrate some of the shortcomings of your approach.
import Text.Regex.Posix
-- Let's start by wrapping your first attempt into a 'Monadic Action'
-- IO is a monad, and hence we can sequence 'actions' (read as: functions)
-- together using do-notation.
attemptOne :: IO [[String]]
-- ^ type declaration of the function 'attemptOne'
-- read as: function returning value having type 'IO [[String]]'
attemptOne = do
putStrLn "type text file"
filePath <- getLine
fileData <- readFile filePath
putStrLn fileData
let parsed = fileData =~ "[^- \".,\n]+" :: [[String]]
-- ^ this form of let syntax allows us to declare that
-- 'wherever there is a use of the left-hand-side, we can
-- substitute it for the right-hand-side and get equivalent
-- results.
putStrLn ("The data after running the regex: " ++ concatMap concat parsed)
return parsed
-- ^ return is a monadic action that 'lifts' a value
-- into the encapsulating monad (in this case, the 'IO' Monad).
-- Here we show that given a search term (a String), and a body of text to
-- search in, we can return the frequency of occurrence of the term within the
-- text.
searchingFunc :: String -> [String] -> Int
searchingFunc term
= length . filter predicate
where
predicate = (==)term
-- ^ we use function composition (.) to create a new function from two
-- existing ones:
-- filter (drop any elements of a list that don't satisfy
-- our predicate)
-- length: return the size of the list
-- Here we build a wrapper-function that allows us to run our 'pure'
-- searchingFunc on an input of the form returned by 'attemptOne'.
runSearchingFunc :: String -> [[String]] -> [Int]
runSearchingFunc term parsedData
= map (searchingFunc term) parsedData
-- Here's an example of piecing everything together with IO actions
main :: IO ()
main = do
results <- attemptOne
-- ^ run our attemptOne function (representing IO actions)
-- and save the result
let searchResults = runSearchingFunc "foo" results
-- ^ us a 'let' binding to state that searchResults is
-- equivalent to running 'runSearchingFunc'
print searchResults
-- ^ run the IO action that prints searchResults
print (runSearchingFunc "foo" results)
-- ^ run the IO action that prints the 'definition'
-- of 'searchResults'; i.e. the above two IO actions
-- are equivalent.
return ()
-- as before, lift a value into the encapsulating Monad;
-- this time, we're lifting a value corresponding to 'null/void'.
To load this code, save it into a .hs file (I saved it into 'temp.hs'), and run the following from ghci. Note: the file 'f' contains a few input words:
*Main Text.Regex.Posix> :l temp.hs
[1 of 1] Compiling Main ( temp.hs, interpreted )
Ok, modules loaded: Main.
*Main Text.Regex.Posix> main
type text file
f
foo foos foor fo foo foo
The data after running the regex: foofoosfoorfofoofoo
[1,0,0,0,1,1]
[1,0,0,0,1,1]
There is a lot going on here, from do notation to Monadic actions, 'let' bindings to the distinction between pure and impure functions/values. I can't stress the value of learning the fundamentals from a good book!
Here is what I made of it. It doesn't does any error checking and is as basic as possible.
import Text.Regex.Posix ((=~))
import Control.Monad (when)
import Text.Printf (printf)
-- Calculates the number of matching words
matchWord :: String -> String -> Int
matchWord file word = length . filter (== word) . concat $ file =~ "[^- \".,\n]+"
getInputFile :: IO String
getInputFile = do putStrLn "Enter the file to search through:"
path <- getLine
readFile path -- Attention! No error checking here
repl :: String -> IO ()
repl file = do putStrLn "Enter word to search for (empty for exit):"
word <- getLine
when (word /= "") $
do print $ matchWord file word
repl file
main :: IO ()
main = do file <- getInputFile
repl file
Please start step by step. IO in Haskell is hard, so you shouldn't start with file manipulation. I would suggest to write a function that works properly on a given String. That way you can learn about syntax, pattern matching, list manipulation (maps, folds) and recursion without beeing distracted by the do notation (which kinda looks imperative, but isn't, and really needs a deeper understanding).
You should check out Learn you a Haskell or Real World Haskell to get a sound foundation. What you do now is just stumbling in the dark - which may work if you learn languages that are similar to the ones you know, but definitely not for Haskell.

Clojure: creating new instance from String class name

In Clojure, given a class name as a string, I need to create a new instance of the class. In other words, how would I implement new-instance-from-class-name in
(def my-class-name "org.myorg.pkg.Foo")
; calls constructor of org.myorg.pkg.Foo with arguments 1, 2 and 3
(new-instance-from-class-name my-class-name 1 2 3)
I am looking for a solution more elegant than
calling the Java newInstance method on a constructor from the class
using eval, load-string, ...
In practice, I will be using it on classes created using defrecord. So if there is any special syntax for that scenario, I would be quite interested.
There are two good ways to do this. Which is best depends on the specific circumstance.
The first is reflection:
(clojure.lang.Reflector/invokeConstructor
(resolve (symbol "Integer"))
(to-array ["16"]))
That's like calling (new Integer "16") ...include any other ctor arguments you need in the to-array vector. This is easy, but slower at runtime than using new with sufficient type hints.
The second option is as fast as possible, but a bit more complicated, and uses eval:
(defn make-factory [classname & types]
(let [args (map #(with-meta (symbol (str "x" %2)) {:tag %1}) types (range))]
(eval `(fn [~#args] (new ~(symbol classname) ~#args)))))
(def int-factory (make-factory "Integer" 'String))
(int-factory "42")
The key point is to eval code that defines an anonymous function, as make-factory does. This is slow -- slower than the reflection example above, so only do it as infrequently as possible such as once per class. But having done that you have a regular Clojure function that you can store somewhere, in a var like int-factory in this example, or in a hash-map or vector depending on how you'll be using it. Regardless, this factory function will run at full compiled speed, can be inlined by HotSpot, etc. and will always run much faster than the reflection example.
When you're specifically dealing with classes generated by deftype or defrecord, you can skip the type list since those classes always have exactly two ctors each with different arities. This allows something like:
(defn record-factory [recordname]
(let [recordclass ^Class (resolve (symbol recordname))
max-arg-count (apply max (map #(count (.getParameterTypes %))
(.getConstructors recordclass)))
args (map #(symbol (str "x" %)) (range (- max-arg-count 2)))]
(eval `(fn [~#args] (new ~(symbol recordname) ~#args)))))
(defrecord ExampleRecord [a b c])
(def example-record-factory (record-factory "ExampleRecord"))
(example-record-factory "F." "Scott" 'Fitzgerald)
Since 'new' is a special form, I'm not sure there you can do this without a macro. Here is a way to do it using a macro:
user=> (defmacro str-new [s & args] `(new ~(symbol s) ~#args))
#'user/str-new
user=> (str-new "String" "LOL")
"LOL"
Check out Michal's comment on the limitations of this macro.
Here is a technique for extending defrecord to automatically create well-named constructor functions to construct record instances (either new or based on an existing record).
http://david-mcneil.com/post/765563763/enhanced-clojure-records
In Clojure 1.3, defrecord will automatically defn a factory function using the record name with "->" prepended. Similarly, a variant that takes a map will be the record name prepended with "map->".
user=> (defrecord MyRec [a b])
user.MyRec
user=> (->MyRec 1 "one")
#user.MyRec{:a 1, :b "one"}
user=> (map->MyRec {:a 2})
#user.MyRec{:a 2, :b nil}
A macro like this should work to create an instance from the string name of the record type:
(defmacro newbie [recname & args] `(~(symbol (str "->" recname)) ~#args))

Is there an Emacs Lisp library for generating HTML?

I'm looking for a solution that allows me to write native Emacs Lisp code and at compile time turns it into HTML, like Franz's htmlgen:
(html
((:div class "post")
(:h1 "Title")
(:p "Hello, World!")))
Of course I can write my own macros, but I'm interested if there are any projects around this problem.
As you found out, xmlgen generates XML from a list structure. What I did find disappointing with the ``xmlgen` package that the format it supports is not quite the inverse of Emacs' xml parser.
I did add this to my copy of xmlgen:
;; this creates a routine to be the inverse of what xml-parse does
;;;###autoload
(defun xml-gen (form &optional in-elm level)
"Convert a sexp to xml:
'(p :class \"big\")) => \"<p class=\\\"big\\\" />\""
(let ((level (or level 0)))
(cond
((numberp form) (number-to-string form))
((stringp form) form)
((listp form)
(destructuring-bind (xml attrs) (xml-gen-extract-plist form)
(let ((el (car xml)))
(unless (symbolp el)
(error "Element must be a symbol (got '%S')." el))
(setq el (symbol-name el))
(concat "<" el (xml-gen-attr-to-string attrs)
(if (> (length xml) 1)
(concat ">" (mapconcat
(lambda (s) (xml-gen s el (1+ level)))
(cdr xml)
"")
"</" el ">")
"/>"))))))))
(defun xml-gen-attr-to-string (plist)
(reduce 'concat (mapcar (lambda (p) (concat " " (symbol-name (car p)) "=\"" (cdr p) "\"")) plist)))
(defun xml-gen-extract-plist (list)
(list (cons (car list) (let ((kids (xml-node-children list)))
(if (= 1 (length kids))
kids
(remove-if-not 'listp kids))))
(xml-node-attributes list)))
Note: the interface for this is xml-gen (not xmlgen which is the original parsing).
With this interface, the following holds:
(string-equal (xml-gen (car (xml-parse-region <some-region-of-xml>)))
<some-region-of-xml>)
and
(equal (car (xml-parse-region (insert (xml-gen <some-xml-form>))))
<some-xml-form>)
The new xml-gen does not strive to preserve the whitespace around that the xml-parse-region routine generates.
This could be a starting point: http://www.emacswiki.org/emacs/HtmlLite
This is not quite what you're looking for, but there's a 20 minute video where a guy creates a simple website using UCW, the UnCommon Web application framework. It's all done in Emacs using lisp...
Here is a link to the transcript (all the code (~25 lines) is available at the end of the transcript).
Meanwhile, I found some code that contains something similar I want. Now I can write:
(views-with-html
((body)
(h1 "Title")
((p (class . "entry")) "Hello, World!")))
The implementation has a few limitations (e.g. hard-coded element list), but it seems to be a good starting point.
I had a similar requirement to be able to parse xml using xml-parse functions, transform it, and then output it back as a xml string.
Trey's solution almost worked except I needed to retain the whitespace xml elements. So I wrote my own implementation here:
https://github.com/upgradingdave/xml-to-string
Have you considered yaclml?
yaclml (Yet Another Common Lisp Markup Language) is an HTML generator and HTML template library. yaclml is used as the html templating backend for the ucw web framework.
https://www.cliki.net/yaclml
It is in common lisp. Not elisp. But, ...