Can Template::Toolkit pretty-print the HTML it outputs? - html

I guess the answer is yes, but what’s the easiest way to do it? I use Template::Toolkit::Simple, if that makes a difference.

I don't think so, but since it's based on a template - just make your template pretty printed.

If you want to create pretty-print HTML then you have to use Template::Plugin::HTML instead of Template::Toolkit::Simple.
You may also try these approaches: Template::Flute and Markapl

Related

What is the meaning of the URL

There is a * in the path of the following URL. Does it have special purpose?
...
Thanks
The most probable explanation is that some kind of parsing is taking place using javascript. Where did you find that?
If i'm correct it is called URLmapping, it uses some kind of framework to fill in the asterisk with dynamics

How extract meaningful text from HTML

I would like to parse a html page and extract the meaningful text from it. Anyone knows some good algorithms to do this?
I develop my applications on Rails, but I think ruby is a bit slow in this, so I think if exists some good library in c for this it would be appropriate.
Thanks!!
PD: Please do not recommend anything with java
UPDATE:
I found this link text
Sadly, is in python
Use Nokogiri, which is fast and written in C, for Ruby.
(Using regexp to parse recursive expressions like HTML is notoriously difficult and error prone and I would not go down that path. I only mention this in the answer as this issue seems to crop up again and again.)
With a real parser like for instance Nokogiri mentioned above, you also get the added benefit that the structure and logic of the HTML document is preserved, and sometimes you really need those clues.
Solutions integrating with Ruby
use Nokogiri as recommended by Amigable Clark kant
Use Hpricot
External Solutions
If your HTML is well-formed, you could use the Expat XML Parser for this.
For something more targeted toward HTML-only, the W3C actually released the code for the LibWWW, which contains a simple HTML parser (documentation).
Lynx is able to do this. This is open source if you want to take a look at it.
You should strip all angle-bracketed part from text and then collapse white-spaces.
In theory the < and > should not be there in other cases. Pages contain < and > everywhere instead of them.
Collapsing whitespaces: Convert all TAB, newline, etc to spaces, then replace every sequence of spaces to a single space.
UPDATE: And you should start after finding the <body> tag.

ActionScript dynamic classes

Could anyone give me a good use case for ActionScript dynamic classes?
Because it really looks like a bad pratice, for me, in every case.
Anything that uses Proxy must by extension be dynamic. I use Proxy fairly regularly; for example, here's a replacement syntax for ExternalInterface using Proxy.
URLVariables, for example.
You could store the data in a dictionary / object / array too, but you don't gain much in this case, I think, and you cut down some boilerplate.
Well you could use an Object object, but using a dynamic class ensures that it is typed. That's the way I see it, and it's the only reason I would use them. What Juan Pablo is saying is a good reason too.

Is there a way to use CSS to highlight keywords?

Well, that's the context: I am editing a latex source file in google docs, and I wonder if I could use CSS to color arbitrary keywords and text enclosed in dollar signs.
For example, given this HTML file:
<html><body>
\section{Heading 1}
<br>
This is a simple file with a formula $x_1 = x_0 + 1$.
<br>
Here it ends \cite{somebody}.
</body></html>
I wanted CSS to let me see this:
\section{Heading 1}
This is a simple file with a formula $x_1 = x_0 + 1$.
Here it ends \cite{somebody}.
I assume it can't be done, since there is no markup isolating these constructs I want to format.
Cheers.
EDIT: Seems like the sample output is not colored as I intended, although it is in the edit view.
You'd need to insert a span-element to wrap around those bits you want highlighted, then style them with a different background color or something else.
So no, a pure CSS-based sollution is impossible.
Your correct. There is no way to do this in CSS alone. Doing so in Javascript however would be quite trivial.
As noted by Arve and Gary, I don't think a pure-CSS solution is possible.
However, if you are able to use javascript in your context is is possible. I am using SyntaxHighlighter by Alex Gorbatchev for syntax highlighting tasks. I would recommend it as long as the kind of style that SyntaxHighlighter can produce fits your needs.
There is some work to do however. It uses a "brush" plugin architecture, and although there is no brush defined for latex it should be quicker to create a brush than build a syntax highlighting solution from scratch.

A regular expression to remove a given (x)HTML tag from a string

Let's say I have a string holding a mess of text and (x)HTML tags. I want to remove all instances of a given tag (and any attributes of that tag), leaving all other tags and text along. What's the best Regex to get this done?
Edited to add: Oh, I appreciate that using a Regex for this particular issue is not the best solution. However, for the sake of discussion can we assume that that particular technical decision was made a few levels over my pay grade? ;)
Attempting to parse HTML with regular expressions is generally an extremely bad idea. Use a parser instead, there should be one available for your chosen language.
You might be able to get away with something like this:
</?tag[^>]*?>
But it depends on exactly what you're doing. For example, that won't remove the tag's content, and it may leave your HTML in an invalid state, depending on which tag you're trying to remove. It also copes badly with invalid HTML (and there's a lot of that about).
Use a parser instead :)
I think there is some serious anti-regex bigotry happening here. There are lots of times when you may want to strip a particular tag out of some markup when it doesn't make sense to use a full blown parser.
Of course there are times when a parser might be the best option, but if you are looking for a regex then:
<script[^>]*?>[\s\S]*?<\/script>
That would remove script tags and their contents. Make sure that you use case-insensitive matching.
If you don't want to remove the contents of the tag then you can use:
<\/?script[^>]*?>
An example of usage in javascript would be:
function stripScripts(markup) {
return markup.replace(/<script[^>]*?>[\s\S]*?<\/script>/gi, '');
}
var safeText = stripScripts(textarea.value);
I think it might be Raymond Chen (blogs.msdn.com/oldnewthing) that I'm paraphrasing (badly!) here... But, you want a Regular Expression? "Now you have two problems" ... :=)
If the string is well-formed (X)HTML, could you load it up into a parser (HTML/XML) and use this to remove any nodes of the offending variety? If it's not well-formed, then it becomes a bit more tricky, but, I suspect that a RegEx isn't the best way to go about this...
There are just TOO many ways a single tag can appear, not to mention encodings, variants, etc.
I strongly suggest you rethink this approach.... you really shouldnt have to be handling HTML directly, anyway.
Off the top of my head, I'd say this will get you started in the right direction.
s/<TAG[^>]*>([^<]*)</TAG[^>]*>/\1
Basically find the starting tag, any text in between the tags, and then the ending tag. Replace the whole thing with whatever was in between the tags.
Corrected answer:
</?TAG\b[^>]*?>
Because Dans answer would remove <br />, but you want only <b>
Here's a regex I wrote for this purpose, it works in a few more situations:
</?(?(?=b|img|a|script)notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:(["",']?).*?\1?)?)*\s*/?>
While using regexes for parsing HTML is generally frowned upon or looked down on, you almost certainly don't want to write your own parser.
You could however use some inbuilt or library functions to achieve what you need.
JavaScript has getElementsByTagName and getElementById, not to mention jQuery.
PHP has the DOM extension.
Python has the awesome Beautiful Soup
...and many more.