How to generate a parse tree using NLTK? - nltk

I am trying to generate a tree like this:
I am not able to find any relevant information in regard to it. Please help.
Parse
(ROOT
(S
(NP (PRP$ My) (NN dog))
(ADVP (RB also))
(VP (VBZ likes)
(S
(VP (VBG eating)
(NP (NN sausage)))))
(. .)))
Thanks.

The NLTK comes with a number of parsers based on CFG and other grammar formalisms, but they are teaching tools of very little practical use: They can only handle a tiny subset of English syntax. (If this is what you are after, your question is a duplicate of this SO question.)
To parse ordinary English text with the nltk, you'll need to install a third-party parser that the nltk knows how to interface with. Your best bet is probably the Stanford Parser, as you probably already knew since you tagged your question stanford-nlp. You'll need the latest version of the nltk (or version 3.1 at least, but later is better.) The abovementioned SO question has some other suggestions in the answers; no idea if they are any good.

you can use StanfordCoreNLP to achieve that
download :
pip install pycorenlp
start your server in this (stanford-corenlp-full-2018-01-31) directory with this command -
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
-port 9000 -timeout 15000
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
output = nlp.annotate(textInput, properties={
'annotators': 'parse',
'outputFormat': 'json',
'timeout': 1000,
})
print(output['sentences'][0]["parse"])
sample input :
Is there any way to associate spotify music with a specific ambient so when I say to Siri Start Beach Ambient
output :
(ROOT
(SQ (VBZ Is)
(NP (EX there))
(NP
(NP (DT any) (NN way))
(S
(VP (TO to)
(VP (VB associate)
(NP (JJ spotify) (NN music))
(PP (IN with)
(NP
(NP (DT a) (JJ specific))
(ADJP (JJ ambient) (RB so)
(SBAR
(WHADVP (WRB when))
(S
(NP (PRP I))
(VP (VBP say)
(PP (TO to)
(NP (NNP Siri) (NNP Start) (NNP Beach) (NNP Ambient)))))))))))))
(. .)))
hope this may help.

Related

Input/Output json file in lisp

Good morning everyone,
To finish this project i needs your help, again.
So now i'm trying to create two functions to reading/writing files in lisp.
this is the description how the functions must work
(json-load filename) -> JSON
(json-write JSON filename) -> filename
The json-load function opens the file filename returns a JSON object (or generates an error). If
filename does not exist the function generates an error. The suggestion is to read the whole file in one
string and then to call json-parse.
The json-write function writes the JSON object to the filename file in JSON syntax. If
filename does not exist, it is created and if it exists it is overwritten. Of course it is expected that
CL-PROMPT> (json-load (json-write '(json-obj # | stuff | #) "foo.json"))
(json-obj # | stuff | #)
this is my json-load function
(defun json-load (filename)
(with-open-file (file-stream filename)
(let ((file-contents (make-string (file-length file-stream))))
(read-sequence file-contents file-stream)
file-contents)) (json-parse (file-contents)))
but it not working
i need some help to write function too.
thanks guys
edit 1:
(defun json-load (filename)
(with-open-file (in filename
:direction :input
:if-does-not-exist :error)
(file-get-contents filename))
(json-parse filename))
(defun file-get-contents (filename)
(with-open-file (stream filename)
(let ((contents (make-string (file-length stream))))
(read-sequence contents stream)
contents)))
so the function should be not far away to be correct but the problem, i think, is the file-get-contents function.
I think that because if i run this function the output is
"\"{\\\"nome\\\" : \\\"Arthur\\\",\\\"cognome\\\" : \\\"Dent\\\"}\""
and so the json-parse does not recognize json-object anymore.
Any ideas?
edit 2:
i try both functions but with the same result. if i call json-parse with the same json-object in the file it's all right but if i call json-load lisp respond me with my own error message "undefined JSON object (json-parse)".
Why?
Edit 3:
This is json-write function but, for now, it doesn't work.
(defun json-write (json filename)
(with-open-file (out filename
:direction :output
:if-exists :overwrite
:if-does-not-exist :create)
(pprint (json out))))
so the description at the beginning of the post says that the json-write function writes JSON object to the filename file in JSON syntax.
Now, 2 questions
1) it's my function partially correct?
2) how can i write a Json object in Json syntax?
Thanks
I'm working on the same project, hopefully the professors don't mind us sharing info ;)
This is the approach I took:
(defun json-load (filename)
(with-open-file (in filename
:direction :input
:if-does-not-exist :error)
(multiple-value-bind (s) (make-string (file-length in))
(read-sequence s in)
(json-parse s))))
Remember that read-sequence overwrites the given sequence, in this case s. I'm using multiple-value-bind simply so that I don't have to use neither variable declarations nor a lambda function (Although it is just a less idiomatic version of (let ((v form)) ...)), as #tfb pointed out).

Open .htm .html files automatically with shr.el in emacs

I've just discovered the shr package in emacs 24.5.1
i.e.
C-x C-f anyfile.html
M-x shr-render-buffer
Looks really good - just what I was after
Can I automate emacs to call shr-render-buffer when I open any .htm or .html file?
UPDATE
I've tried adding the following to my .emacs:
(add-to-list 'auto-mode-alist '("[.]htm$" . shr-render-buffer))
(add-to-list 'auto-mode-alist '("[.]html$" . shr-render-buffer))
but I get the error:
File mode specification error: (void-function shr-render-buffer)
The html file then gets opened in Fundamental mode and it looks even worse than HTML mode
It seems you want to run the function shr-render-buffer automatically once a html file is opened. As you said, the mode for .htm/.html is html-mode by default, you can add the function invocation to the html-mode-hook, such as:
(add-hook 'html-mode-hook '(lambda() (shr-render-buffer (current-buffer))))
As #lawlist pointed, put it after (require 'shr).
As this is emacs, the hardest part of doing what you want is deciding on what is the best approach. This largely depends on personal taste/workflows. I would highly recommend looking at the browse-url package in more detail. One thing I use is a function which allows me to switch between using eww or my default system browser - this means I can easily render web content either in emacs or in chrome/safari/whatever.
Some years ago, I wrote a utility which would allow me to view a number of different file formats, including rendered html, in emacs. I rarely use this now as doc-view has pretty much replaced most of this functionality and is much better. However, it does show how you can use defadvice to modify the view-file function so that id does different things depending on the file type. Note that as this is old emacs code and emacs has improved, there are probably better ways of doing this now. I also know that the 'advice' stuff has been re-worked, but this legacy stuff still works OK. Should get you started. Note that the functionality for MS doc, docx, pdf etc relies on external executables.
My preferred workflow would be to write a function which allows me to reset the browse-url-browser-function to either eww-browse-url or browse-url-default-browser and bind that to a key. I can then choose to display the html in emacs or the external browser and leverage of all the work already done in browse-url.
(require 'custom)
(require 'browse-url)
;; make-temp-file is part of apel prior to emacs 22
;;(static-when (= emacs-major-version 21)
;; (require 'poe))
(defgroup txutils nil
"Customize group for txutils."
:prefix "txutils-"
:group 'External)
(defcustom txutils-convert-alist
'( ;; MS Word
("\\.\\(?:DOC\\|doc\\)$" doc "/usr/bin/wvText" nil nil nil nil nil)
;; PDF
("\\.\\(?:PDF\\|pdf\\)$" pdf "/usr/bin/pdftotext" nil nil nil nil nil)
;; PostScript
("\\.\\(?:PS\\|ps\\)$" ps "/usr/bin/pstotext" "-output" t nil nil nil)
;; MS PowerPoint
("\\.\\(?:PPT\\|ppt\\)$" ppt "/usr/bin/ppthtml" nil nil nil t t))
"*Association for program convertion.
Each element has the following form:
(REGEXP SYMBOL CONVERTER SWITCHES INVERT REDIRECT-INPUT REDIRECT-OUTPUT HTML-OUTPUT)
Where:
REGEXP is a regexp to match file type to convert.
SYMBOL is a symbol to designate the fyle type.
CONVERTER is a program to convert the fyle type to text or HTML.
SWITCHES is a string which gives command line switches for the conversion
program. Nil means there are no switches needed.
INVERT indicates if input and output program option is to be
inverted or not. Non-nil means to invert, that is, output
option first then input option. Nil means do not invert,
that is, input option first then output option.
REDIRECT-INPUT indicates to use < to direct input from the input
file. This is useful for utilities which accept input
from stdin rather than a file.
REDIRECT-OUTPUT indicates to use > to direct output to the output
file. This is useful for utilities that only send output to
stdout.
HTML-OUTPUT Indicates the conversion program creates HTML output
rather than plain text."
:type '(repeat
(list :tag "Convertion"
(regexp :tag "File Type Regexp")
(symbol :tag "File Type Symbol")
(string :tag "Converter")
(choice :menu-tag "Output Option"
:tag "Output Option"
(const :tag "None" nil)
string)
(boolean :tag "Invert I/O Option")
(boolean :tag "Redirect Standard Input")
(boolean :tag "Redirect Standard Output")
(boolean :tag "HTML Output")))
:group 'txutils)
(defun txutils-run-command (cmd &optional output-buffer)
"Execute shell command with arguments, putting output in buffer."
(= 0 (shell-command cmd (if output-buffer
output-buffer
"*txutils-output*")
(if output-buffer
"*txutils-output*"))))
(defun txutils-quote-expand-file-name (file-name)
"Expand file name and quote special chars if required."
(shell-quote-argument (expand-file-name file-name)))
(defun txutils-file-alist (file-name)
"Return alist associated with file of this type."
(let ((al txutils-convert-alist))
(while (and al
(not (string-match (caar al) file-name)))
(setq al (cdr al)))
(if al
(cdar al)
nil)))
(defun txutils-make-temp-name (orig-name type-alist)
"Create a temp file name from original file name"
(make-temp-file (file-name-sans-extension
(file-name-nondirectory orig-name)) nil
(if (nth 7 type-alist)
".html"
".txt")))
(defun txutils-build-cmd (input-file output-file type-alist)
"Create the command string from conversion alist."
(let ((f1 (if (nth 3 type-alist)
output-file
input-file))
(f2 (if (nth 3 type-alist)
input-file
output-file)))
(concat
(nth 1 type-alist)
(if (nth 2 type-alist) ; Add cmd line switches
(concat " " (nth 2 type-alist)))
(if (nth 4 type-alist) ; redirect input (which may be output
(concat " < " f1) ; if arguments are inverted!)
(concat " " f1))
(if (nth 5 type-alist) ; redirect output (see above comment)
(concat " > " f2)
(concat " " f2)))))
(defun txutils-do-file-conversion (file-name)
"Based on file extension, convert file to text. Return name of text file"
(interactive "fFile to convert: ")
(let ((f-alist (txutils-file-alist file-name))
output-file)
(when f-alist
(message "Performing file conversion for %s." file-name)
(setq output-file (txutils-make-temp-name file-name f-alist))
(message "Command: %s" (txutils-build-cmd file-name output-file f-alist))
(if (txutils-run-command
(txutils-build-cmd (txutils-quote-expand-file-name file-name)
(txutils-quote-expand-file-name
output-file) f-alist))
output-file
file-name))))
(defadvice view-file (around txutils pre act comp)
"Perform file conversion or call web browser to view contents of file."
(let ((file-arg (ad-get-arg 0)))
(if (txutils-file-alist file-arg)
(ad-set-arg 0 (txutils-do-file-conversion file-arg)))
(if (string-match "\\.\\(?:HTML?\\|html?\\)$" (ad-get-arg 0))
(browse-url-of-file (ad-get-arg 0))
ad-do-it)))
(provide 'init-text-convert)

emacs-org mode and html publishing: how to change structure of generated HTML

I'm beginner to this so sorry if I overlook something simple...
I'd like to use emacs org-mode for my HTML pages. The 'default' setup is nice and working, however I'd like to use one of the free web templates, e.g. http://www.freecsstemplates.org/preview/goodlife/
These templates provide CSS files, however just usage of CSS in org-mode's HTML export seem not to be enough. It seems that to use these templates correctly I need as well to maintain HTML structure as shown in such template.
How can I force org-mode to generate HTML structure I like (i.e. frame division)?
It seems, that some options are offered by 'org-export-generic.el'. Even if I would persuade generic export to provide me with a single HTML page, it still does not resolve completely the HTML export....
This section of the org-mode manual provides some guidance on exporting to html and using css http://orgmode.org/manual/CSS-support.html#CSS-support This includes a description of the default classes org-mode uses so you could modify your CSS.
If you want to modify org mode exports to match your CSS classes and ids use the :HTML_CONTAINER_CLASS: property in an org headline and the :CUSTOM_ID: property for creating ids.
Instead of setting things up per file I use org mode's publishing ability to output many org files into a single website. You can find a tutorial on that here http://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html
My org-publish-project-alist looks like:
'(org-publish-project-alist (quote (("requirements" :components ("req-static" "req-org"))
("req-static" :base-directory "~/org/requirements" :publishing-directory "~/public_html/requirements/" :base-extension "gif\\|css" :publishing-function org-publish-attachment)
("req-org" :base-directory "~/org/requirements/" :publishing-directory "~/public_html/requirements/" :style "<link rel=\"stylesheet\" type=\"text/css\" href=\"./style.css\" />" :section-numbers nil :headline-levels 3 :table-of-contents 2 :auto-sitemap t :sitemap-filename "index.org" :sitemap-title "Requirements for My Software" :link-home "./index.html"))
I agree. The HTML generated by org's built-in export is good but not quite what I'd want. It appears that the generic export are based on elisp, whereas I prefer XSLT.
I wrote the following code for turning an org file into XML, but I haven't written the publishing transforms yet. Anyway, this may be helpful for your reference, especially as it shows the structure of an org document's internal representation.
(require 'org-element)
(defvar xml-content-encode-map
'((?& . "&")
(?< . "<")
(?> . ">")))
(defvar xml-attribute-encode-map
(cons '(?\" . """) xml-content-encode-map))
(defun write-xml (o out parents depth)
"Writes O as XML to OUT, assuming that lists have a plist as
their second element (for representing attributes). Skips basic
cycles (elements pointing to ancestor), and compound values for
attributes."
(if (not (listp o))
;; TODO: this expression is repeated below
(princ o (lambda (charcode)
(princ
(or (aget xml-content-encode-map charcode)
(char-to-string charcode))
out)))
(unless (member o parents)
(let ((parents-and-self (cons o parents))
(attributes (second o)))
(dotimes (x depth) (princ "\t" out))
(princ "<" out)
(princ (car o) out)
(loop for x on attributes by 'cddr do
(let ((key (first x))
(value (second x)))
(when (and value (not (listp value)))
(princ " " out)
(princ (substring (symbol-name key) 1) out)
(princ "=\"" out)
(princ value (lambda (charcode)
(princ
(or (aget xml-attribute-encode-map charcode)
(char-to-string charcode))
out)))
(princ "\"" out))))
(princ ">\n" out)
(loop for e in (cddr o) do
(write-xml e out parents-and-self (+ 1 depth)))
(dotimes (x depth) (princ "\t" out))
(princ "</" out)
(princ (car o) out)
(princ ">\n" out)))))
(defun org-file-to-xml (orgfile xmlfile)
"Serialize ORGFILE file as XML to XMLFILE."
(save-excursion
(find-file orgfile)
(let ((org-doc (org-element-parse-buffer)))
(with-temp-file xmlfile
(let ((buffer (current-buffer)))
(princ "<?xml version='1.0'?>\n" buffer)
(write-xml org-doc buffer () 0)
(nxml-mode)))))
(find-file xmlfile)
(nxml-mode))
(defun org-to-xml ()
"Export the current org file to XML and open in new buffer.
Does nothing if the current buffer is not in org-mode."
(interactive)
(when (eq major-mode 'org-mode)
(org-file-to-xml
(buffer-file-name)
(concat (buffer-file-name) ".xml"))))

NLTK Clause and Phrase breakdowns

Is there a way to get NLTK to return text fully marked with all Treebank clause and Treebank phrase demarcations (or equivalent; it need not be Treebank)? I need to be able to return both clauses and phrases (separately). The only thing on this that I have found is in the NLTK Bird/Klein/Loper book in chapter 7 where it says you can not process for noun phrases and verb phrases at the same time, but I want to do much more than that! I think the Stanford POS parser does this but the client wants to use only the NLTK. Thanks.
Have you looked at chapter 8 yet? It sounds like you want something like:
>>> from nltk.corpus import treebank
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0]
>>> print t
(S
(NP-SBJ
(NP (NNP Pierre) (NNP Vinken))
(, ,)
(ADJP (NP (CD 61) (NNS years)) (JJ old))
(, ,))
(VP
(MD will)
(VP
(VB join)
(NP (DT the) (NN board))
(PP-CLR
(IN as)
(NP (DT a) (JJ nonexecutive) (NN director)))
(NP-TMP (NNP Nov.) (CD 29))))
(. .))
in addition to the chunking resources that you have already found. But if you mean that you want to parse text you supply, there are also options like:
>>> sr_parse = nltk.ShiftReduceParser(grammar1)
>>> sent = 'Mary saw a dog'.split()
>>> print sr_parse.parse(sent)
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))
but this relies on grammar1 being populated manually beforehand. Chunking is easier than parsing.

Is there an Emacs Lisp library for generating HTML?

I'm looking for a solution that allows me to write native Emacs Lisp code and at compile time turns it into HTML, like Franz's htmlgen:
(html
((:div class "post")
(:h1 "Title")
(:p "Hello, World!")))
Of course I can write my own macros, but I'm interested if there are any projects around this problem.
As you found out, xmlgen generates XML from a list structure. What I did find disappointing with the ``xmlgen` package that the format it supports is not quite the inverse of Emacs' xml parser.
I did add this to my copy of xmlgen:
;; this creates a routine to be the inverse of what xml-parse does
;;;###autoload
(defun xml-gen (form &optional in-elm level)
"Convert a sexp to xml:
'(p :class \"big\")) => \"<p class=\\\"big\\\" />\""
(let ((level (or level 0)))
(cond
((numberp form) (number-to-string form))
((stringp form) form)
((listp form)
(destructuring-bind (xml attrs) (xml-gen-extract-plist form)
(let ((el (car xml)))
(unless (symbolp el)
(error "Element must be a symbol (got '%S')." el))
(setq el (symbol-name el))
(concat "<" el (xml-gen-attr-to-string attrs)
(if (> (length xml) 1)
(concat ">" (mapconcat
(lambda (s) (xml-gen s el (1+ level)))
(cdr xml)
"")
"</" el ">")
"/>"))))))))
(defun xml-gen-attr-to-string (plist)
(reduce 'concat (mapcar (lambda (p) (concat " " (symbol-name (car p)) "=\"" (cdr p) "\"")) plist)))
(defun xml-gen-extract-plist (list)
(list (cons (car list) (let ((kids (xml-node-children list)))
(if (= 1 (length kids))
kids
(remove-if-not 'listp kids))))
(xml-node-attributes list)))
Note: the interface for this is xml-gen (not xmlgen which is the original parsing).
With this interface, the following holds:
(string-equal (xml-gen (car (xml-parse-region <some-region-of-xml>)))
<some-region-of-xml>)
and
(equal (car (xml-parse-region (insert (xml-gen <some-xml-form>))))
<some-xml-form>)
The new xml-gen does not strive to preserve the whitespace around that the xml-parse-region routine generates.
This could be a starting point: http://www.emacswiki.org/emacs/HtmlLite
This is not quite what you're looking for, but there's a 20 minute video where a guy creates a simple website using UCW, the UnCommon Web application framework. It's all done in Emacs using lisp...
Here is a link to the transcript (all the code (~25 lines) is available at the end of the transcript).
Meanwhile, I found some code that contains something similar I want. Now I can write:
(views-with-html
((body)
(h1 "Title")
((p (class . "entry")) "Hello, World!")))
The implementation has a few limitations (e.g. hard-coded element list), but it seems to be a good starting point.
I had a similar requirement to be able to parse xml using xml-parse functions, transform it, and then output it back as a xml string.
Trey's solution almost worked except I needed to retain the whitespace xml elements. So I wrote my own implementation here:
https://github.com/upgradingdave/xml-to-string
Have you considered yaclml?
yaclml (Yet Another Common Lisp Markup Language) is an HTML generator and HTML template library. yaclml is used as the html templating backend for the ucw web framework.
https://www.cliki.net/yaclml
It is in common lisp. Not elisp. But, ...