Can a number register be used in a groff request? - units-of-measurement

Can someone explain to me why embedding literal values (e.g., "1c") in the .po ("page offset") request works but referencing a number register does not?
.\" Set the dimensions of an A4 page.
.nr a4_width 21c
.nr a4_height 29.7c
.nr a4_margin_horizontal 1c
.nr a4_margin_vertical 1c
.nr a4_content (\n[a4_width] - (\n[a4_margin_horizontal] * 2))
.
.\" Page-offset and line-length
.po \n[a4_margin_horizontal]
.ll \n[a4_content]
.\" Uncomment the below two lines and everything works.
.\" .po 1c
.\" .ll 19c
.
.\" Start the document.
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.
.sp

When you write the following line:
.nr a4_margin_horizontal 1c
the letter c is a scale indicator; thus the actual numeric value stored in the register will be about 28346.
When you write later:
.po \n[a4_margin_horizontal]
there is no scale indicator; thus the interpreter falls back on the default scale indicator which is m according to line layout (manual page).
If you want to make it work, add the u indicator after the \n request:
.po \n[a4_margin_horizontal]u

It's funny how I bang my head against the wall for a full hour, then when I finally ask my question on StackOverflow, I figure it out.
From the online manual:
gtroff (like many other programs) requires numeric parameters to specify
various measurements. Most numeric parameters9 may have a measurement unit
attached. These units are specified as a single character that immediately
follows the number or expression. Each of these units are understood, by
gtroff, to be a multiple of its basic unit. So, whenever a different
measurement unit is specified gtroff converts this into its basic units. This
basic unit, represented by a ‘u’, is a device dependent measurement, which is
quite small, ranging from 1/75th to 1/72000th of an inch. The values may be
given as fractional numbers; however, fractional basic units are always rounded
to integers.
My measure of 1c (1 centimeter) was being converted to 28346 "basic units" (72,000 per inch), and the .po request was expected a default unit of measure.

Related

Query text for two expressions within distance of n words

I'm looking for a way to query a column in a MySQL-database containing text for cases where two expressions occures within a distance of n words or characters. For example: "sed" within 3 words of "eirmod" in the following sentences.
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
erat, sed diam voluptua. At vero eos et accusam et justo duo dolores
et ea rebum.
I couldn't find a way to include it as a condition in the select and like command.
SELECT * FROM database as db
WHERE db.text like '%sed%' AND db.text like '%eirmod%'
Is there a way to do it? Or is a complete different approach necessary? Thank you in advance.
You can use RLIKE here:
SELECT *
FROM yourTable
WHERE text RLIKE 'sed ([^ ]* ){0,3}eirmod'
If you want to find sed and eirmod within 3 words of each other in any order, then you can expand the regex with an alternation:
SELECT *
FROM yourTable
WHERE text RLIKE 'sed ([^ ]* ){0,3}eirmod|eirmod ([^ ]* ){0,3}sed'
you can try with regex
select * from table
where regexp '^sed[[:blank:]][[:alpha:]]+[[:blank:]][[:alpha:]]+
[[:blank:]]eirmod'

Process HTML Entities in CDATA Element

I am currently working on a XSLT transformation to change the structure of some XML documents into a structure, which is needed by an external service provider.
My source document contains an a very large element with CDATA content like this:
<ABC>
<![CDATA[
Lorem ipsum dolor sit amet<br><br>
onsetetur sadipscing elitr, sed diam nonumy eirmod tempor<br>
At vero eos et äccusam et justo duo dolores et ea rebum
..."LARGE CONTENT"...
]]>
</ABC>
Please note, that the text contains unclosed <br> Elements and many different HTML entities like ä
The desired result in my destination document should look like this:
<p>
Lorem ipsum dolor sit amet<br/><br/>
onsetetur sadipscing elitr, sed diam nonumy eirmod tempor<br/>
At vero eos et äccusam et justo duo dolores et ea rebum
..."LARGE CONTENT"...
</p>
No CDATA, the <br> elements were closed, so I have well formed XML and the HTML entities were transformed to UTF characters, like in the example: ä --> ä
Except the entities, which you have to escape for XML, like <,>,",',&
My way to process this:
<xsl:template match="ABC">
<xsl:variable name="temp" select="replace(text(),'&auml;','ä')"/>
<!--[... many replacement rules for HTML entities...]-->
<xsl:value-of select="replace($temp,'<br>','<br/>')" disable-output-escaping="yes"/>
</xsl:template>
This template fulfills its requirements, but there are many replacement rules and it seems to be very circumstantial and ineffective.
Is there a better way to process this unescaping of HTML entities?
If you want to parse a fragment of HTML or a HTML document and you use a commercial version of Saxon 9 (PE or EE) then it provides HTML parsing support with the help of TagSoup exposed as an extension function saxon:parse-html (in the namespace http://saxon.sf.net/, see http://www.saxonica.com/documentation/index.html#!functions/saxon/parse-html) which could be called in
<xsl:template match="ABC">
<p>
<xsl:apply-templates select="saxon:parse-html(.)/node()"/>
</p>
</xsl:template>
or similar to process the nodes created by the TagSoup HTML parser.

How to "grep" lines with html tags from a sql text field?

A mysql table contains more than 74000 entries in a field named "body" of type text.
I need a view with only contains lines that contain HTML tags.
Example:
Record 1 =>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr\n
sed diam nonumy eirmod temporary invidunt ut labore et dolore\n
<hr>
aliquyam magna erat, sed diam voluptua.
Record 2 =>
At vero eos et <strong>accusam</strong> et justo duo dolores et ea rebum.\n
Stet clita kasd gubergren, No sea takimata sanctus est Lorem\n
ipsum dolor sit amet.
Record = 3>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr\n
<ul><li>sed</li> <li>diam</li></ul> nonumy eirmod temporary invidunt ut labore et dolore\n
aliquyam magna erat, sed diam voluptua.
The output should contain only the rows with the HTML tags:
Record 1 =>
Lorem ipsum dolor sit amet, <a href="http://foo.bar">consetetur</ a> sadipscing elitr\n
Record 2 =>
At vero eos et <strong>accusam</strong> et justo duo dolores et ea rebum.\n
Record 3 =>
<ul><li>sed</li> <li>diam</li></ul> nonumy eirmod temporary invidunt ut labore et dolore\n
I need the output for manual review by a script run.
Does anyone have an idea for a corresponding sql select statement, e.g.
SELECT `body` FROM `messages` WHERE `body` REGEXP -> `<regexp_for_html-tags_here>`;
or something like this.
regards
If you want to use regex you could do something like this:
SELECT body
FROM messages
WHERE body REGEXP '.*<[:alpha:][:alnum:]*.*>.*';
EDIT
In the comments some have mentioned performance. You may be able to use a combination of LIKE and REGEXP to improve performance - i.e. the like statement will narrow the data to the interesting rows, then the REGEXP can refine the search (e.g. to avoid issues where these characters appear but not as potential tag names).
SELECT body
FROM messages
WHERE body like '%<%>%'
and body REGEXP '.*<[:alpha:][:alnum:]*.*>.*';
http://sqlfiddle.com/#!2/70c47/2

Zend PDF: How do I print bold text or text in another color etc?

I have a long string, like:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
Some of the words should be bold and some text should have another color and so on. Is it somehow possible to give the drawText function a string that already contains the correct syntax so the PDF will have the bold text and so on? Something like this:
$text = "my <b>text</b>...";
$page->drawText($text, 100, ($page->getHeight() - 100));
Its not working with HTML, but something simliar?
Thanks!
The short answer to your question is no, Zend Framework does not provide the formatting functions that you are looking for.
Zend_Pdf provides the primitive functions for drawing text, lines, circles, etc onto the page, but that's about it. If you want to bold some text in the middle of a line, you have to draw the first bit of text, change the font style to bold, draw the bit of text you want bolded, switch back to the original font style and then draw the remainder of the line. And you have to look after line wrapping, page breaks, etc, yourself too.
I wrote a blog post some time ago that talks about these challenges in more depth and have posted a wrapper class on github that makes Zend_Pdf a little easier to use. The post is here: http://yetanotherprogrammingblog.com/content/zend_pdf-wrapper-and-sample-code and the wrapper class is here: https://github.com/jamesggordon/Wrap_Pdf. Unfortunately this version of the class doesn't do precisely what you want, but it shouldn't be too hard to modify the writeText() method to implement the font changing system that you're after.
I too am looking for something similar. We have some legacy code that renders a PDF using Zend PDF.. it's very complicated.
I have in the past used something called DomPDF - this is probably what you need as it just turns HTML into PDF for you - very easy to use!
http://dompdf.github.io/
It can be done, check out zend.pdf.drawing;
http://framework.zend.com/manual/1.12/en/zend.pdf.drawing.html
You'd need to break your PHP strings up, then change the Zend PDF drawing styles between each PHP string.
$style->setFont(\Zend_Pdf_Font::fontWithName(\Zend_Pdf_Font::FONT_TIMES_BOLD), 12);
$page->setStyle($style);
$page->drawText(....);
$style->setFont(\Zend_Pdf_Font::fontWithName(\Zend_Pdf_Font::FONT_TIMES), 10);
$page->setStyle($style);
$page->drawText(....);

Using only CR as linebreak inside pre tag doesn't work

At work, we stumbled upon Bugzilla creating HTML output that led to lines much too long because the browser didn't break the lines.
This was happening on Chrome, but not on Firefox 3.5, so we didn't really care. But Firefox 4 behaves just like Chrome, so we had to find another workaround.
An example is:
<html>
<body>
<pre>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et
dolore magna aliquyam erat, sed diam voluptua. At vero eos
et accusam et justo duo dolores et ea rebum. Stet clita kasd
gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet.
</pre>
</body>
</html>
The server is using only CR as a linebreak which is very uncommon and the usual alternatives (CR+LF, only LF) work correctly, so the right way to fix this is to tell the Bugzilla server to use one of these linebreak methods. Anyway, I'm curious why this
doesn't work and ignoring the linebreaks seems to be the "correct" way for browsers.
Also, I found a strange local workaround for Chrome and FF 4 using a Greasemonkey script (modified version of this one):
var els = document.getElementsByTagName("*");
for(var i = 0, l = els.length; i < l; i++) {
var el = els[i];
el.innerHTML = el.innerHTML;
}
It seems this would've no effect on the page, but with this script, linebreaks suddenly are showing correctly.
So my questions are:
Is the Chrome/FF 4 way the "correct" way to handle these kinds of linebreaks inside <pre>?
Why is this Greasemonkey script working?
The GM script works because apparently JS converts CR's (\r) to LF (\n), dynamically on writes to the DOM.
See this test at jsFiddle. Notice how the CR (decimal 13), at the end of the 2nd line, gets converted to LF (decimal 10).
Yes, the HTML RFC defines a line break as:
http://www.w3.org/TR/html401/struct/text.html#line-breaks
A line break is defined to be a carriage return (
), a line feed (
), or a carriage return/line feed pair. All line breaks constitute white space.
However, a bare carriage return is extremely rare. I'm not surprised it doesn't work. But technically, I'd say that FF4 and Chrome are in the wrong.
Not sure why your greasemonkey script is working. My guess is that getting el.innerHTML is converting CR to CR-LF or LF.