python pylatex line spacing, units and math equations in strings - equation

I have a text block as a string that contains some SI units and equations. How can I for example use superscript numbers (e.g. 10^-10 m^2) and math equations in strings? Greek letters and e.g. the ± symbol work fine.
from pylatex import Document, Section, Subsection, Command, Figure
from pylatex.utils import italic, bold, NoEscape
doc = Document('Test', geometry_options = {"head": "2cm","margin": "2cm","bottom": "2cm"})
with doc.create(Section('Header 1')):
doc.append('The average area is less than 10m^2 (±0.5m^2).')
doc.generate_pdf(clean_tex = False,compiler='pdflatex')
I also wonder how I can define the line spacing (linespread) in pylatex.

Related

R Stargazer table ASCII text output formatting (line break, alignment & reference group)

For better or worse, I don't use LaTeX (yet). I like producing stargazer formatted tables on the fly for class examples in both HTML and in the console. However, I'm having trouble with 3 formatting elements; so far I've found solutions for LaTeX and some in HTML, but the ASCII console text eludes me.
The 3 challenges are:
Breaking a line so that a variable name can wrap instead of increasing the table width.
Aligning coefficients & std. errors at the decimal, even when there are p-value stars.
Making space in the covariate labels & coefficients to allow for a reference group.
Let's start with some reproducible data & outputs to reference.
set.seed(3); x1 <- factor(sample(letters[1:4], 1000, replace=TRUE))
set.seed(4); x2 <- runif(1000, -10, 10)
set.seed(5); x3 <- rbinom(1000, size = 1, prob = 0.13)
set.seed(6); y <- runif(1000, -10, 10)
model <- (lm(y ~ x1 + x2 + x3))
stargazer(model, align=TRUE,
#type="html", out="SO_stargazer.html",
type="text", out="SO_stargazer.txt",
title="Example Title Goes Here",
dep.var.caption="",
dep.var.labels="This is my long title for the Dependent Variable Y",
covariate.labels=c("X1 Group B",
"X1 Group C",
"X1 Group D",
"X2 with a super ridiculous and annoyingly long name",
"X3"))
Line break
My default approach is to use \n in the character string. For example, I might try to break the DV caption:
dep.var.labels="This is my long title for \n the Dependent Variable Y",
But that generates the following error message:
Error in if (nchar(text.matrix[r, c]) > max.length[real.c]) { : missing value where TRUE/FALSE needed
Found a couple posts about this issue (here which reference here), but the poster on the first did not provide much of an example to follow and the second pertained to an underscore that I don't have or gave LaTeX solutions. The only difference that broke what already worked was the addition of the \n. I did try using the tex \\ escape, but that didn't do anything useful for text output.
I am able to get line breaks using <br> in the string for the html output file version.
This post also mentions the tex and html solutions, but not text.
Alignment on the decimal
When there are no statistical significance stars on coefficients, both the coefficients and std. errors align nicely, centered on the decimal point. However, once the stars appear, it 'pushes' the coefficient to the left. This happens in both the text and html output. This is not so bad with 1 star, but 3 stars can be quite a difference. How can I coerce it back to align on the decimal value for both formats? This issue persists even if I use the single.row=TRUE option. This post answer by #Marco Doe has a great visual of what I'm talking about, but noted the centering is for tex. Found a LaTeX solution, but no mention of the other formats on that post. I've tinkered with the align and float options to no avail (inspired by these quasi-related tex solution posts here and here). The latter post hinted at using xtable or post-process edits, but that was more than 5 years ago; so I'm hoping for an updated viable solution.
This image is from Marco Doe's solution and shows the LaTeX output, but does a good job showing an example output formats I get (left) and what I would like to have (right).
Reference categories
Found a LaTex solution, that 'pushes' the covariates & coeffient data down a row, making room for a reference group to be printed in the covariate column; however, the solution is in tex. How can I replicate this for the text output? Can I replicate it for HTML version as part of the R code without having to get surgical with the HTML output code?
#Giac posted the images (linked above) to illustrate the have (left) and want (right). Although these images are tex, how could I get the right image output in text and html?

Displaying an EBNF grammar using HTML and CSS

What would be a good way to display an EBNF (or EBNF-like) grammar using HTML and CSS? I do not want to use the code tag. I want be able to display stuff aligned (viz. definition symbols ‘=’ with alternation symbols ‘|’, but also remarks).
Here’s an example of a grammar I want to display, only as a simple plain text:
<expression> = <integral>
| <variable> (only variables of type Integer are allowed)
<integral> = <digit>+
<variable> = x | y | … (any lower case latin letter)
…
The spacing within definitions like <integral> = <digit>+’ should be “as usual”, i.e. not compromised by some way of aligning the definition symbols (as would be the case when using tabulars for example).

difference between " " and nbsp; or " "

Hello I am trying to compile an EPUB v2.0 with html code extracted from Indesign. I have noticed there are a lot of "special characters" either at the beginning of a paragraph or at the end. For example
<p class="text_indent0px font_size0_8em line_height1_325 margin_bottom1px margin_left0px margin_right0px sans_serif floatleft">E<span class="small_caps">VELYNE</span> </p>
What is this
and can I either get rid of it or replace it with a "nbsp;"?
&#9
Is the ascii code for tabs. So I guess the paragraphs were indented with tabs.
If you want to replace them with then use 4 of them
That would be a horizontal tab (i.e. the same as using the tab key).
If you want to replace it, I would suggest doing a find/replace using an ePub editor like Sigil (http://sigil-ebook.com/).
represents the horizontal tab
Similarly represent space.
To replace you have to use
In the HTML encoding &#{number}, {number} is the ascii code. Therefore, is a tab which typically condenses down to one space in HTML, unless you use CSS (or the <pre> tag) to treat it as pre formatted text.
Therefore, it's not safe to replace it with a non-breaking or a regular space unless you can guarantee that it's not being displayed as a tab anywhere.
div:first-child {
white-space: pre;
}
<div> Test</div>
<div> Test</div>
<pre> Test</pre>
See https://developer.mozilla.org/en-US/docs/Web/CSS/white-space and http://ascii.cl/
is the entity used to represent a non-breaking space
decimal char code of space what we enter using keyboard spacebar
decimal char code of horizontal tab
and both represent space but is non-breaking means multiple sequential occurrence will not be collapsed into one where as for the same case, ` will collapse to one space
= approx. 4 spaces and approx. 8 spaces
There are four types of character reference scheme used.
Using decimal character codes (regex-pattern: &#[0-9]+;),
Using hexadecimal character codes (regex-pattern: &#x[a-f0-9]+;),
Using named character codes (regex-pattern: &[a-z]+;),
Using the actual characters (regex-pattern: .).
Al these conversions are rendered same way. But, the coding style is different. For example, if you need to display a latin small letter E with diaeresis then you could use any of the below convention:
ë (decimal notation),
ë (hexadecimal notation),
ë (html notation),
ë (actual character),
Likewise, as you said, what should be used (a) (decimal notation) or (b) (html notation) or (c) (decimal notation).
So, from the above analogy, it can be said that the (a), (b) and (c) are three different kind of notation of three different characters.
And, this is for your information that, (a) is a Horizontal Tab, the (b) one is the non-breaking space which is actually   in decimal notation and the (c) is the decimal notation for normal space character.
Now, technically space at the end of the paragraph, is nothing but meaningless. Better, you could discard those all. And if you still need to use space inside <pre> elements, not in <p> or <div>.
Hope this helps...

HTML Special Characters for fraction with equal Numerator and Denominator

In my html page I have displayed fractions using html special character. My idea is to display 1/2, 2/2 and 3/3.
I have used &frac13; for 1/3 and &frac23; for 2/3 and the special charactera are displayed correctly. I took reference from this link HTML Special Characters
But when I tried using &frac33; for 3/3 it is not working. It is just displaying as it is, not converting to special character.
Could you someone please tell me what is the html special character for 3/3.
Thank You
<sup>3</sup>⁄<sub>3</sub>
Result: 3⁄3
Not all fractions have their own special character. For those fractions (like 3/3) which don't have slanted fraction characters, use the HTML entity ⁄:
<sup>3</sup>⁄<sub>3</sub> = 3⁄3
There is no named (or numeric) character reference for a character representing 3/3, since there simply is no such character.
In theory, the FRACTION SLASH U+2044 “⁄” character (representable as ⁄ in HTML, among other thing) can be used between digits to suggest that rendering routines present the combination as a typographic fraction. In practice, only some typesetting programs can do this, and web browsers come nowhere near.
Trying to play with HTML markup and/or CSS to construct something that looks like a typographic fraction (comparable to ½ in appearance) tend to produce messy results, including uneven line spacing.
The practical option is to use just common notations like 2/2. But if you want something like a typographic fraction, you could use MathML with MathJax. More exactly, you would use the mfrac element in MathML with the attribute bevelled="true". Sample code:
<!doctype html>
<title>Fractions with MathJax and MathML</title>
<script src=
"http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
Here we have the common fraction ½, then
a simulation with HTML and CSS:
<sup>1</sup>⁄<sub>2</sub>.
Note that this tends to create uneven line spacing.
There are some cures to that, but let us see how MathML works:
<math>
<mfrac bevelled="true">
<mn>1</mn>
<mn>2</mn>
</mfrac>
</math>.
Some text here to demonstrate that line spacing has not
been disturbed here.
Sample rendering:

Text blocks positions and sizes detection in command line mode in tesseract

tesseract OCR have a command line interface, which allow us to recognize text from images with some parameters.
Input argumetns are imagename (path to image) outputbase (name of recognized text) and -psm pagesegmode parameters.
pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.
But can it library write positions and sizes of recognized text blocks to the specific file or it is an internal information?
Tesseract 3.0x supports a "hocr" command option, which produces a HTML-format output file consisting of recognized words and their coordinates. It does not have size/font info, though.