I have a Perl script that converts text to HTML using HTML::TextToHTML
Some of the original text has quasi-tables in that their alignment matters.
For example
Job no Description Completed
15 Paving from NE 32 to 45th 11-01-17
Right now I am using this code
use HTML::TextToHTML;
my $conv = new HTML::TextToHTML();
if ( $HTML eq 'Y' ) { # convert entire body to HTML
$body = $conv->process_chunk($body);
}
But with the code above the lines often lose the appropriate spacing on some email clients.
Is there a way in HTML::TextToHTML to preserve the width of rows and their alignment?
You won't do much better than setting
make_tables => 1
in the constructor, and possibly setting
table_type => { ALIGN => 1, PGSQL => 0, BORDER => 0, DELIM => 0 }
which will split words that are separated by two or more spaces into the columns of a table
But you may well have to edit your original text file a little to get the best results. For instance your column headings will be Job no Description and Completed because there is only a single space after Job no
Related
Im completely new to typoscript therefore I have quite a hard time with the syntax but I think I am getting there.
My task is to render an HTML table and fill it with values from a database table (doesn't matter which one). In my case I took the tt_content table and tried to fill my HTML table with the "header" field and the "bodytext" field.
So I made a completely empty template and wrote the following code in the "setup" field of the template. I added some headers and texts to the sites I have to test my code but I get a completely empty page not even the "table" HTML tags are there.
After 4 days of research I still don't know what my problem is here so I am quite desperate.
Here is what I have so far:
page = PAGE
page.typeNum = 0
lib.object = COA_INT
lib.object {
10 = TEXT
10.value = <table>
20 = CONTENT
20.wrap = <tr>|</tr>
20 {
table = tt_content
select {
orderBy = sorting
}
renderObj = COA
renderObj {
10 = COA
10 {
10 = TEXT
10 {
field = header
wrap = <td>|</td>
}
20 = TEXT
20 {
field = bodytext
wrap = <td>|</td>
}
}
}
}
20 = TEXT
20.value = </table>
}
If someone could help me out here it would be much appreciated.
Thanks in advance.
Check if you have any 'template parser' running.
go to template -> choose 'Info/modify' and click on 'edit the whole ...'
There choose the includes tab and include css_styled_content' (Yes, there is another way of parsing your content, with fluid_styled_content'. you can choose that instead if you are on TYPO3 7.6.* or higher)
These 'parsers' will give you all the needed typoscript included to parse and render your content. Without these, nothing will be rendered when you want to render content from the backend.
second: your typoscript is wrong
You have made a content array (lib.content is a Content Object Array) and filled it with content. But you overwrite the content with key 20.
change
20 = TEXT
20.value = </table>
to
30 = TEXT
30.value = </table>
third: you have created a Page object but you did not add your COA into that page object.
Try this:
page = PAGE
page.10 < lib.object
What this does is include your lib.content in the Page Object at 'level' 10
you can also do
page.20 = TEXT
page.20.value = hello world
This will be rendered after your lib.content.
As you could notice. It is a bit as writing a big Array (because typoscript is a big Array ;)
Beware that you place your lib.content ABOVE the page object declaration. else it will not be able to include it.
There is also a slack channel for TYPO3 you can join if you have other questions. People over there are more then willing to help you.
https://forger.typo3.org/slack
I'm trying to change the color of the link text in a link to yellow on a page that another script (not controlled by me) generates. More specifically, I'm searching for specific text in two tables on this page. Once I find the text (which are hyperlinks) I want to change their color to yellow.
I am using HTML::Element and I can find the text easily. The problem is, there is no specified link color, so the links use the default value of blue. I am trying to add the HTML element of font color to the tag but I'm not having much luck.
If I try using something like (where "$a" is the HTML::Element object for the link I'm trying to edit):
$a->attr("font color", "yellow");
It adds the attribute but doesn't change the text color of the link content.
if I try something like:
my $content = $a->content;
$content->attr("font color", "yellow");
That only adds the text
<font color=yellow>
to the content without, again, changing the actual content text color.
Trying to splice it in doesn't work either.
I finally hit upon this:
my $yellowFont = HTML::Element->new('font', 'color' => 'yellow');
foreach my $item_ref ($a->content_refs_list) {
next if ref $$item_ref;
$yellowFont->push_content($$item_ref);
}
print $yellowFont->as_HTML, "\n";
Which works beautifully in the sense that it creates:
<font color="yellow">201301022150-Job5</font>
But that change isn't reflected in the html document!
I'm at a loss as to how to insert the font color attribute into the original html document.
Below is my entire script. It's a mess because I've been trying a variety of different methods without success.
#!/usr/local/bin/perl
use warnings;
use strict;
use HTML::TableExtract qw(tree);
use Data::Dumper qw(Dumper);
my #jobList = ();
if ($ARGV[0]) {#jobList = $ARGV[0];} else {die ("Need list of jobs as argument\n")};
my $ddHTMLFile = "./tmp_aptg";
my $te1 = HTML::TableExtract->new( depth => 1, count => 0);
my $te2 = HTML::TableExtract->new( depth => 1, count => 1);
$te1->parse_file($ddHTMLFile);
$te2->parse_file($ddHTMLFile);
my $table1 = $te1->first_table_found;
my $table2 = $te2->first_table_found;
my $table1_tree = $table1->tree;
my $table2_tree = $table2->tree;
foreach my $a ($table1_tree->find_by_tag_name("a")) {
my $href = $a->attr("href");
if ($href =~ m/$jobList[0]/) {
my $yellowFont = HTML::Element->new('font', 'color' => 'yellow');
foreach my $item_ref ($a->content_refs_list) {
next if ref $$item_ref;
$yellowFont->push_content($$item_ref);
}
#print $yellowFont->as_HTML, "\n";
$a->replacewith
$a->dump;
#my $table1_html = $table1_tree->as_HTML;
#my $document1_tree = $te1->tree;
#my $document1_html = $document1_tree->as_HTML;
#my $document_html = $document1_html;
#print "$document_html";
}
}
Each time somebody uses the <font> tag, we have to sacrifice a hecatomb of cute kittens to the angry webdevs that were promised semantic markup. A font in itself has no semantics. Instead, such things can be easily done via CSS which unsuprisingly excels at changing the color of elements.
To set the color of one element to yellow, we have to add the following code to the style attribute:
color: yellow !important;
Something like
$a->attr(style => "color: yellow !important;");
is likely to do the trick, although that would overwrite any previous contents. We could try to append our color to the previous contents, but we have no guarantee that the CSS already there is valid.
If the target browsers understand CSS3 (*sigh*), we could use some nice selectors to do that job for us, like
<style>
table a[href~="$foo"] { color: yellow !important }
</style>
where $foo holds a sane string to be literally matched (no regexes).
Here is a data-url you can copy&paste into your address bar to see this (hopefully) working:
data:text/html,<style>table a[href~="foo"] { color: yellow !important }</style><table><tr><td>bar</td><td>foo</tr></table>
The other solution would be to create a new <span> element that carries the CSS, and is the sole child of the link. The former childs of <a> would then be childs of the <span>.
# not tested, but looks reasonable
my $span = HTML::Element->new("span", style => "...");
my #childs = $a->detach_content;
$span->push_content(#childs);
$a->push_content($span);
This is slightly different from the previous solution, but this difference shouldn't matter unless some advanced CSS tricks were used in the page layout.
If you really have to, you can adapt this solution to use font tags.
"pleease don't! can we haz <span>?" ← the kittens.
To see what you can do with the HTML element objects, see the HTML::Element documentation.
Just putting part of some code here where I am writing two values to a text file.
Ada.Long_Float_Text_IO.Put (File => Output_File, Item => Out_2, Fore => 1, Aft => 4, Exp => 0);
Ada.Text_IO.Put (Output_File, " ");
Ada.Long_Float_Text_IO.Put (File => Output_File, Item => Out_3, Fore => 1, Aft => 4, Exp => 0);
I can separate these numbers Out_2 and Out_3 by a space as shown in the code. The results give (after writing more numbers in the two columns):
-75.2340 421.5700
1256.0000 15.4700
-4568.9800 -118.2800
3784.2100 0.0000
I would like to know if there is a way to specify a tab spacing so that I can have something like this in my text file:
-75.2340 421.5700
1256.0000 15.4700
-4568.9800 -118.2800
3784.2100 0.0000
So which control character produces the above alignment?
Thanks a lot...
For a tab, there’s the obsolescent ASCII.HT or Ada.Characters.Latin_1.HT.
Or you could use the Width parameter to Ada.Long_Float_Text_IO.Put and friends.
Edit: There is no Width parameter for real output! You could use a large Fore, which would effectively right-justify the output.
Instead of that intervening:
Ada.Text_IO.Put (Output_File, " ");
Call the Set_Col procedure instead, which moves the output line position to the specified column. E.g.
Ada.Text_IO.Set_Col(Output_File, 13);
I have written the following Perl script-
use HTML::TreeBuilder;
my $html = HTML::TreeBuilder->new_from_content(<<END_HTML);
<span class=time>1 h </span>
User: There are not enough <b>big</b>
<b>fish</b> in the lake ;
END_HTML
my $source = "foo";
my #time = "10-14-2011";
my $name = $html->find('a')->as_text;
my $comment = $html->as_text;
my #keywords = map { $_->as_text } $html->find('b');
Which outputs- foo, 10-14-2011, User, 1h User: There are not enough big fish in the lake, big fish
Which is perfect and what I wanted from the test html but
this only works fine when I put in the aforementioned HTML, which I did for test purposes.
However the full HTML file has multiple references of 'a' and 'b' for instances therefore when printing out the results for these columns are blank.
How can I account for multiple values for specific searches?
Without sight of your real HTML it is hard to help, but $html->find returns a list of <a> elements, so you could write something like
foreach my $anchor ($html->find('a')) {
print $anchor->as_text, "\n";
}
But that will find all <a> elements, and it is unlikely that that is what you want. $html->look_down() is far more flexible, and provides for searching by attribute as well as by tag name.
I cannot begin to guess about your problem with comments without seeing what data you are dealing with.
If you need to process each text element independently then you probably need to call the objectify_text method. This turns every text element in the tree into a pseudo element with a ~text tag name and a text attribute, for instance <p>paragraph text</p> would be transformed into <p><~text text="paragraph text" /></p>. These elements can be discovered using $html->find('~text') as normal. Here is some code to demonstrate
use strict;
use warnings;
use HTML::TreeBuilder;
my $html = HTML::TreeBuilder->new_from_content(<<END_HTML);
<span class=time>1 h </span>
User: There are not enough <b>big</b>
<b>fish</b> in the lake ;
END_HTML
$html->objectify_text;
print $_->attr('text'), "\n" for $html->find('~text');
OUTPUT
1 h
User
: There are not enough
big
fish
in the lake ;
I'm trying to get links from table in HTML. By using HTML::TableExtract, I'm able to parse table and get text (i.e. Ability, Abnormal in below example) but cannot get link that involves in the table. For example,
<table id="AlphabetTable">
<tr>
<td>
Ability <span class="count">2650</span>
</td>
<td>
Abnormal <span class="count">26</span>
</td>
</table>
Is there a way to get link using HTML::TableExtract ? or other module that could possibly use in this situation. Thanks
part of my code:
$mech->get($link->url());
$te->parse($mech->content);
foreach $ts ($te->tables){
foreach $row ($ts->rows){
print #$row[0]; #it only prints text part
#but I want its link
}
}
HTML::LinkExtor, passing the extracted table text to its parse method.
my $le = HTML::LinkExtor->new();
foreach $ts ($te->tables){
foreach $row ($ts->rows){
$le->parse($row->[0]);
for my $link_tag ( $le->links ) {
my ($tag, %links) = #$link_tag;
# next if $tag ne 'a'; # exclude other kinds of links?
print for values %links;
}
}
}
Use keep_html option in the constructor.
keep_html
Return the raw HTML contained in the cell, rather than just the visible text. Embedded tables are not retained in the HTML extracted from a cell. Patterns for header matches must take into account HTML in the string if this option is enabled. This option has no effect if extracting into an element tree structure.
$te = HTML::TableExtract->new( keep_html => 1, headers => [qw(field1 ... fieldN)]);