visualise html video with matplotlib animation - html

In my notebook I get some data from URL, perform some analysis and do some plotting.
I want also create a html animation using FuncAnimation of matplotlib.animation.
So in the preamble I do
import matplotlib.animation as manim
plt.rcParams["animation.html"] = "html5"
%matplotlib inline
(something else... def init()..., def animate(i)...) then
anima = manim.FuncAnimation(fig,
animate,
init_func=init,
frames=len(ypos)-d0,
interval=200,
repeat=False,
blit=True)
To visualise, I then call
FFMpegWriter = manim.writers['ffmpeg']
writer = FFMpegWriter(fps=15)
link = anima.to_html5_video()
from IPython.core.display import display, HTML
display(HTML(link))
because I want the clip to show up as a neat html video in the notebook
Whereas this works well on my machine, on Watson-Studio I get the following error:
RuntimeError: Requested MovieWriter (ffmpeg) not available
I've checked that ffmpeg is available in the form of a Python package
(!pip freeze --isolated | grep ffmpeg gives ffmpeg-python==0.2.0)
The question is: how can I tell matplotlib.animation.writers to use the codec in ffmpeg-python?
Many thanks to all responders and supporters

We currently don't have ffmpeg pre-installed in Watson Studio on Cloud. The package ffmpeg-python that you mention is just a Python wrapper, but it won't work without the actual ffmpeg.
You can install ffmpeg from conda:
!conda install ffmpeg
Once you have the full list of additional packages that your notebook needs, I recommend to create a custom environment. Then you don't have to put install commands into the actual notebook.
The customization might look like this:
dependencies:
- ffmpeg=4.2.2
- pip
- pip:
- ffmpeg-python==0.2.0

Related

Docstrings are not generated on Read the Docs with Sphinx autodoc and napoleon extensions

I am using the Sphinx autodoc and napoleon extensions to generate the documentation for my project (Qtools). This works well on my local machines. I am using Sphinx 3.1.2 (or higher). However, when I build the documentation on Read the Docs (RTD), only text added directly to the reStructuredText files that form the source of the documentation is processed. The docstrings that are supposed to be pulled in by autodoc do not appear in the HTML documentation generated by RTD. So for example in docs\source\section2_rsdoc.rst I have:
Response spectra
================
The response spectrum class
---------------------------
.. autoclass:: qtools.ResponseSpectrum
:members:
Response spectrum creation
--------------------------
.. autofunction:: qtools.calcrs
.. autofunction:: qtools.calcrs_cmp
.. autofunction:: qtools.loadrs
See also :func:`qtools.convert2rs` (converts a power spectrum into a response spectrum).
This results in:
Response spectra
The response spectrum class
Response spectrum creation
See also qtools.convert2rs (converts a power spectrum into a response spectrum).
In other words, all directives are apparently ignored, and hyperlinks to other functions are not added. I have examined several basic guidance documents such as this one, but I cannot figure out what I am doing wrong. RTD builds the documentation without any errors or warnings. In RTD advanced settings I have:
Documentation type: Sphinx HTML
Requirements file: requirements.txt
Python interpreter: CPython 3.x
Install Project: no
Use system packages: no
Python configuration file: blank
Enable PDF build: no
Enable EPUB build: no
I haven't touched any other settings.
In conf.py I have tried the following variations of line 15: sys.path.insert(0, os.path.abspath('.')), sys.path.insert(0, os.path.abspath('../..')) and the current sys.path.insert(0, os.path.abspath('../../..')). None of those made any difference.
I would be grateful for any help!
RTD builds the documentation without any errors or warnings
This is slightly incorrect. As you can see in the build logs, autodoc is emitting numerous warnings like this one:
WARNING: autodoc: failed to import class 'ResponseSpectrum' from module 'qtools'; the following exception was raised:
No module named 'qtools'
This has happened for all your variations of sys.path.insert, as you can see in some past builds.
Trying to make it work this way is tricky, since Read the Docs does some magic to guess the directory where your documentation is located, and also the working directory changes between commands.
Instead, there are two options:
Locate where the conf.py is located (see How do you properly determine the current script directory?) and work out a relative package from there.
Invest some time into making your code installable using up-to-date Python packaging standards, for example putting all your sources inside a qtools directory, and creating an appropriate pyproject.toml file using flit.

Embed Jupyter HTML output in a web page

I want to embed the HTML output of Jupyter, in my own web page. The reason for this is primarily, so that I can use Jupyter from my own webapp - and also access my research notebooks from anywhere in the world - via the internet.
A typical use case scenario would be that I click on a button on my page, and an iframe will be inserted in my page; Jupyter will then be launched at the backend (if not already running), and the output of Jupyter will be 'piped' to the iframe - so that I can use Jupyter from within my page.
The naive solution it appeared, was to use <iframe>, but there were two problems:
The iframe cross domain policy problem
Jupyter generated a one time authentication token when first launched
Is there anyway I can overcome these issues, so I can embed the output of Jupyter in my own web page?
you need to check nbconvert - https://github.com/jupyter/nbconvert
there you have 2 options.
to use command line to run notebook and then to let some web server
to server .html
to use python, and nbconvert library
here is short code :
if you want to show already generated:
from nbconvert.preprocessors import ExecutePreprocessor
import nbformat
from nbconvert import HTMLExporter
from nbconvert.preprocessors.execute import CellExecutionError
src_notebook = nbformat.reads(ff.read(), as_version=4) #where ff is file opened with some open("path to notebook file")
html_exporter = HTMLExporter()
html_exporter.template_file = 'basic' #basic will skip generating body and html tags.... use "all" to gen all..
(body, resources) = html_exporter.from_notebook_node(src_notebook)
print(body) #body have html output
if you want also to run notebook, then :
from nbconvert.preprocessors import ExecutePreprocessor
import nbformat
from nbconvert import HTMLExporter
from nbconvert.preprocessors.execute import CellExecutionError
src_notebook = nbformat.reads(ff.read(), as_version=4) #where ff is file opened with some open("path to notebook file")
ep = ExecutePreprocessor(timeout=50, kernel_name='python3')
ep.preprocess(src_notebook, {})
html_exporter = HTMLExporter()
html_exporter.template_file = 'basic' #basic will skip generating body and html tags.... use "all" to gen all..
(body, resources) = html_exporter.from_notebook_node(src_notebook)
print(body) #body have html output
You can directly do that using the html_embed pre-processor:
$ jupyter nbconvert --to html_embed Annex.ipynb
[NbConvertApp] Converting notebook Annex.ipynb to html_embed
/usr/local/lib/python3.6/site-packages/nbconvert/filters/datatypefilter.py:41: UserWarning: Your element with mimetype(s) dict_keys(['image/pdf']) is not able to be represented.
mimetypes=output.keys())
[NbConvertApp] Writing 2624499 bytes to Annex.html
Strangely, I could not find a direct reference in the manual from nbconvert.
You can use ipython nbconvert - -to html notebook.ipynb to obtain the html code for the same.
Here is a guide on how to do it Blogging with the IPython notebook - see here
If your website is writing in python the use python embed docs
Also this Tutorial - see here
or use kyso.io
Here is how to embed Jupyter using Kyso platform - see here
(disclaimer - I’m a founder of kyso)

using cd! in ammonite scripts fails in 0.7.8, worked in earlier version

Using ammonite 0.7.0 using cd! in scripts would change you to that directory and execute the following bit of code, which was great as i've been using ammonite for build & deploy of a scala project.
But in 0.7.8 this does not work any longer, it fails like...
cat TestCd.sc
import ammonite.ops._
import ammonite.ops.ImplicitWd._
cd! root/'Users/'jeff
Error:
TestCd.sc:4: not found: value cd
val res_2 = cd! root/'Users/'jeff
I can make it work in this (very) small test by changing the code to import and instantiate a ammonite.shell.ShellSession, but that leads to other issues.
I've asked on gitter and in github issues, thought i'd cast a wider net as i've not received responses.
Thanks in advance, i don't want to stay on an old version or rewrite the deployment script in a more mature scripting language, as I'm using scala for other things, and feel this is critical to writing shell scripts in any language.
Jeff
While it would be nice if this just worked. An item I missed is that you can install a custom ~/.ammonite/predefScript.sc, and this is how i've gotten around the issue. The contents is identical to predef.sc without the final line. Feel free to grab it from this gist if you need this as well.
predefScript.sc - Gist
Add it to your system with
mkdir -p ~/.ammonite && curl -L -o ~/.ammonite/predefScript.sc https://git.io/v1vv7

Python: How Can I Render HTML Code And Show The Result To The User?

I'm building a python app that should get a certain HTML code, render it and display the result to the user in a tkinter gui. How can I do that? I would prefer having some built-in module, or some module which I can use easy_install to get. Thanks for any advance.
(I'm using OSX Yosemite with python 2.7)
I've managed to render simple html tags using tkhtml
just pip3 install tkinterhtml
and, from the package example:
from tkinterhtml import HtmlFrame
import tkinter as tk
root = tk.Tk()
frame = HtmlFrame(root, horizontal_scrollbar="auto")
frame.set_content("<html></html>")
frame.set_content(urllib.request.urlopen("http://thonny.cs.ut.ee").read().decode())
Hope it helps :)
Save your HTML to a file location, and use the webbrowser module's open() function to display it; see https://docs.python.org/2/library/webbrowser.html for documentation.

How do I segment a document using Tesseract then output the resulting bounding boxes and labels

I'm trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this 'out of the box' because of the results shown at the ICDAR competitions where contestants had to segment and various documents (academic paper here). Here's an example from that paper illustrating what I want to create:
I have built the latest version of tesseract using brew, brew install tesseract --HEAD, and have been trying to edit config files located in /usr/local/Cellar/tesseract/HEAD/share/tessdata/configs/ to output labelled boxes. The output received using hocr as the config, i.e.
tesseract infile.tiff outfile_stem -l eng -psm 1 hocr
gives a bounding box for everything and has some labelling in class tags e.g.
<p class='ocr_par' dir='ltr' id='par_5_82' title="bbox 2194 4490 3842 4589">
<span class='ocr_line' id='line_5_142' ...
but I can't visualise this. Is there a standard tool to visualize hOCR files, or is the facility to create an output file with bounding boxes built into Tesseract?
The current head version details:
tesseract 3.04.00
leptonica-1.71
libjpeg 8d : libpng 1.6.16 : libtiff 4.0.3 : zlib 1.2.5
Edit
I'm really looking to achieve this using the command line tool (as in examples above). #nguyenq has pointed me to the API reference, unfortunately I have no c++ experience. If the only solution is to use the API, please can you provide a quick python example?
Success. Many thanks to the people at the Pattern Recognition and Image Analysis Research Lab (PRImA) for producing tools to handle this. You can obtain them freely on their website or github.
Below I give the full solution for a Mac running 10.10 and using the homebrew package manager. I use wine to run windows executables.
Overview
Download tools: Tesseract OCR to Page (TPT) and Page Viewer (PVT)
Use the TPT to run tesseract on your document and convert the HOCR xml to a PAGE xml
Use the PVT to view the original image with the PAGE xml information overlaid
Code
brew install wine # takes a little while >10m
brew install gs # only for generating a tif example. Not required, you can use Preview
brew install wget # only for downloading example paper. Not required, you can do so manually!
cd ~/Downloads
wget -O paper.pdf "http://www.prima.cse.salford.ac.uk/www/assets/papers/ICDAR2013_Antonacopoulos_HNLA2013.pdf"
# This command can be ommitted and you can do the conversion to tiff with Preview
gs \
-o paper-%d.tif \
-sDEVICE=tiff24nc \
-r300x300 \
paper.pdf
cd ~/Downloads
# ttptool is the location you downloaded the Tesseract to PAGE tool to
ttptool="/Users/Me/Project/tools/TesseractToPAGE 1.3"
# sudo chmod 777 "$ttptool/bin/PRImA_Tesseract-1-3-78.exe"
touch "$ttptool/log.txt"
wine "$ttptool/bin/PRImA_Tesseract-1-3-78.exe" \
-inp-img "$dl/Downloads/paper-3.tif" \
-out-xml "$dl/Downloads/paper-3-tool.xml" \
-rec-mode layout>>log.txt
# pvtool is the location you downloaded the PAGE Viewer tool to
pvtool="/Users/Me/Project/tools/PAGEViewerMacOS_1.1/JPageViewer 1.1 (Mac OS, 64 bit)"
cd "$pvtool"
dl=~
java -XstartOnFirstThread -jar JPageViewer.jar "$dl/Downloads/paper-3-tool.xml" "$dl/Downloads/paper-3.tif"
Results
Document with overlays (rollover to see text and type)
Overlays alone (use GUI buttons to toggle)
Appendix
You can run tesseract yourself and use another tool to convert its output to PAGE format. I was unable to get this to work but I'm sure you'll be fine!
# Note that the pvtool does take as input HOCR xml but it ignores the region type
brew install tesseract --devel # installs v 3.03 at time of writing
tesseract ~/Downloads/paper-3.tif ~/Downloads/paper-3 hocr
mv paper-3.hocr paper-3.xml # The page viewer will only open XML files
java -XstartOnFirstThread -jar JPageViewer.jar "$dl/Downloads/paper-3.xml"
At this point you need to use the PAGE Converter Java Tool to convert the HOCR xml into a PAGE xml. It should go a little something like this:
pctool="/Users/Me/Project/tools/JPageConverter 1.0"
java -jar "$pctool/PageConverter.jar" -source-xml paper-3.xml -target-xml paper-3-hocrconvert.xml -convert-to LATEST
Unfortunately, I kept getting null pointers.
Could not convert to target XML schema format.
java.lang.NullPointerException
at org.primaresearch.dla.page.converter.PageConverter.run(PageConverter.java:126)
at org.primaresearch.dla.page.converter.PageConverter.main(PageConverter.java:65)
Could not save target PAGE XML file: paper-3-hocrconvert.xml
java.lang.NullPointerException
at org.primaresearch.dla.page.io.xml.XmlInputOutput.writePage(XmlInputOutput.java:144)
at org.primaresearch.dla.page.converter.PageConverter.run(PageConverter.java:135)
at org.primaresearch.dla.page.converter.PageConverter.main(PageConverter.java:65)
You can use its API to obtain the bounding boxes at various levels (character/word/line/para) -- see API Example. You have to draw the labels yourself.
If you are python familiar, you can directly use tesserocr library which is a nice python wrapper around the C++ API. Here is a code snippet to draw polygons at block level using PIL:
from PIL import Image, ImageDraw
from tesserocr import PyTessBaseAPI, RIL, iterate_level, PSM
img = Image.open(filename)
results = []
with PyTessBaseAPI() as api:
api.SetImage(img)
api.SetPageSegMode(PSM.AUTO_ONLY)
iterator = api.AnalyseLayout()
for w in iterate_level(iterator, RIL.BLOCK):
if w is not None:
results.append((w.BlockType(), w.BlockPolygon()))
print('Found {} block elements.'.format(len(results)))
draw = ImageDraw.Draw(img)
for block_type, poly in results:
# you can define a color per block type (see tesserocr.PT for block types list)
draw.line(poly + [poly[0]], fill=(0, 255, 0), width=2)
With Tesseract 4.0.0, a command like tesseract source/dir/myimage.tiff target/directory/basefilename hocr will create a basefilename.hocr file with block-, paragraph-, line-, and word-level bounding boxes for the OCR'ed text. Even the command without the hocr config creates a text file with newlines between block-level text, but the hocr format is more explicit.
More config options here: https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs
Shortcut
It is also possible to open HOCR files directly with the PageViewer tool. The file extension has to be .xml, however.
The HOCR individual character step is now available in Tesseract since 4.1.
Once the installation check, use :
tesseract {image file} {output name} -c tessedit_create_hocr=1 -c hocr_char_boxes=1