I have a template that, amongst other things, contains this:
<nowiki>{{{1}}}</nowiki>
Now, I knew when I wrote this that it wouldn't work, but it represents what I'm trying to do. The contents of {{{1}}} should not parse wiki markup, but this of course results in {{{1}}} not being parsed.
How can I achieve this?
Update:
I found the answer on wikipedia, but I'm having a hard time implementing it. I've copied the Dtag and Nowiki meta templates from Wikipedia. I can't figure out how to make them work though. I'm using:
{{<includeonly>subst:</includeonly>Nowiki|{{{1}}}}}</nowiki>
And it results in this output:
{{subst:dtag|nowiki|{{{1}}}}}</nowiki>
Since version 1.12, there is a magic word for arbitrary tag insertion: {{#tag:nowiki|{{{1}}}}}. See documentation.
Related
I have a situation here and need your suggestion, Through QTP or UFT, I need to validate a page from HTML point of view.
To be in detail, Assume that there is a snippet of HTML ,(like test test>). Now QTP should read the text ( which I can do using the :innerhtml property) and then say that the snippet is valid. Basically I am not really worried of the content , But the HTML tags should be validated and they need to be syntaxically correct.
Any help would be appreciated.
Get elements that interest you by Description create, and then check outerhtml of every object. If you want to make everything from nothing (I think so, if you are not using existing validators) , you will have to write your own parser that counts brackets, quotation marks, etc. Basing on regexp you have to check semantics after every defined word as title. Lot of work with regexp... and not only regexp...
Simpler, but "dirty" solution: copy what you have, open new tab with html validator, paste, check output :D
Edit: after long research I found an article on HP blog (the future of testing blog) that might be helpful in solving this type of problems: link
There is a lot of argument back and forth over when and if it is ever appropriate to use a regex to parse html.
As a common problem that comes up is parsing links from html my question is, would using a regex be appropriate if all you were looking for was the href value of <a> tags in a block of HTML? In this scenario you are not concerned about closing tags and you have a pretty specific structure you are looking for.
It seems like significant overkill to use a full html parser. While I have seen questions and answers indicating the using a regex to parse URLs, while largely safe is not perfect, the extra limitations of structured <a> tags would appear to provide a context where one should be able to achieve 100% accuracy without breaking a sweat.
Thoughts?
Consider this valid html:
<!DOCTYPE html>
<title>Test Case</title>
<p>
<!-- <a href="url1"> -->
<span class="><a href='url2'>"></span>
url<'>click
</p>
What is the list of urls to be extracted? A parser would say just a single url with value my">url<. Would your regular expression?
I'm one of those people who think using regex in this situation is a bad idea.
Even if you just want to match a href attribute from a <a> tag, your regex expression will still run through the whole html document, which make any regex based solution cluttered, unsafe and bloated.
Plus, matching href attributes from tags with a XML parser is all but overkill.
I have been parsing html pages every weeks for at least 2 years now. At first, I was using full regex solutions, I was thinking it's easier and simpler than using a HTML parser.
But I had to come back on my code quite a lot, for many reasons :
the source code had changed
one of the source page had broken html and I didn't tested it
I didn't try my code for every pages of the source, only to find out a few of them didn't work.
...
I found that fixing long regex patterns is not exactly the funniest thing, you have to put your mind over it again and again.
What I usually from now on is :
using tidy to clean the html source.
Use DOM + Xpath to actually parse the page and extract the parts I want.
Use regexes only on small text-only parts (like the trimed textContent of a node)
The code is far more robust, I don't have to spend 2hrs on a long regex pattern to find out why it isn't working for 1% of the sources, it just feel proper.
Now, even in cases where I'm not concerned about closing tags and I have a pretty specific structure, I'm still using DOM based solutions, to keep improving my skills with DOM libraries and just produce better code.
I don't like to see on here people who just comment "Don't use regex on html" on every html+regex tagged question, without providing sample code or something to start with.
Here is an example to match href attributes from links in PHP, just to show that using a HTML parser for those common tasks isn't overkill at all.
$dom = new DOMDocument();
$dom->loadHTML($html);
// loop on every links
foreach($dom->getElementsByTagName('a') as $link) {
// get href attribute
$href = $link->getAttribute('href');
// do whatever you want with them...
}
I hope this is helping somehow.
I proposed this one :
<a.*?href=["'](?<url>.*?)["'].*?>(?<name>.*?)</a>
On this thread
Eventually it can fail for what can be in name.
Robert C. Martin's book Clean Code contains the following:
HTML in source code comments is an abomination [...] If comments are going to be extracted by some tool (like Javadoc) to appear in a Web page, then it should be responsibility of that tool, and not the programmer, to adorn the comments with appropriate HTML.
I kind of agree - source code surely would look cleaner without the HTML tags - but how do you make decent-looking Javadoc pages then? There's no way to even separate paragraphs without using a HTML tag. Javadoc manual says it clearly:
A doc comment is written in HTML.
Are there some preprocessor tools that could help here? Markdown syntax might be appropriate.
I agree. (This is also the reason why I am -strongly- opposed to C#-style "XML comment blocks"; the Javadoc DSL at least provides some escape for top-level entities!). To this end I simply do not try to make the javadoc look pretty...
...anyway, you may be interested in Doxygen. Here is a very quick post Doxygen versus Javadoc. It also brings up the issues that you do :-)
HTML is nothing I'd like to see in "normal" comments. But for Tools like JavaDoc, HTML adds the possibility to add formatting information, bullet points etc...
I would distinguish these two things:
non-javadoc code comments are for the programmer who maintains or enhances the code i question. he has to dig through existing sources, and any HTML in coments just doesn't make things easier. So, ban it in normal comments.
javadoc-comments are used to generate documentation. Use HTML where it helps. But a very limited subset of HTML should suffice.
I have some html (in this case created via TinyMCE) that I would like to add to a page. However, for security reason, I don't want to just print everything the user has entered.
Does anyone know of a templatetag (a filter, preferably) that will allow only a safe subset of html to be rendered?
I realize that markdown and others do this. However, they also add additional markup syntax which could be confusing for my users, since they are using a rich text editor that doesn't know about markdown.
There's removetags, but it's a blacklisting approach which fails to remove tags when they don't look exactly like the well-formed tags Django expects, and of course since it doesn't attempt to remove attributes it is totally vulnerable to the 1,000 other ways of script-injection that don't involve the <script> tag. It's a trap, offering the illusion of safety whilst actually providing no real security at all.
HTML-sanitisation approaches based on regex hacking are almost inevitably a total fail. Using a real HTML parser to get an object model for the submitted content, then filtering and re-serialising in a known-good format, is generally the most reliable approach.
If your rich text editor outputs XHTML it's easy, just use minidom or etree to parse the document then walk over it removing all but known-good elements and attributes and finally convert back to safe XML. If, on the other hand, it spits out HTML, or allows the user to input raw HTML, you may need to use something like BeautifulSoup on it. See this question for some discussion.
Filtering HTML is a large and complicated topic, which is why many people prefer the text-with-restrictive-markup languages.
Use HTML Purifier, html5lib, or another library that is built to do HTML sanitization.
You can use removetags to specify list of tags to be remove:
{{ data|removetags:"script" }}
I have an FAQ in HTML (example) in which the questions refer to each other a lot. That means whenever we insert/delete/rearrange the questions, the numbering changes. LaTeX solves this very elegantly with \label and \ref -- you give items simple tags and LaTeX worries about converting to numbers in the final document.
How do people deal with that in HTML?
ADDED: Note that this is no problem if you don't have to actually refer to items by number, in which case you can set a tag with
<a name="foo">
and then link to it with
some non-numerical way to refer to foo.
But I'm assuming "foo" has some auto-generated number, say from an <ol> list, and I want to use that number to refer to and link to it.
There is nothing like this in HTML.
The way you would normally solve this, is by having the HTML for the links generated, by either parsing the HTML itself and inserting the TOC (you can do that on the server, before you send the HTML out to the browser, or on the client, by traversing the DOM with a little piece of ECMAScript and simply collecting and inspecting all <a> elements) or generating the entire HTML document from a higher level source like a database, an XML document, markdown or – why not? – even LaΤΕΧ.
I know it's not widely supported by browsers, but you can do this using CSS counter.
Also, consider using ids instead of names for your anchors.
Instead of \label{key} use <a name="key" />. Then link using Link.
PrinceXML can do that, but that's about it. I suppose it'd be best to use server-side scripting.
Here's how I ended up solving this with a php script:
http://yootles.com/genfaq
It's roughly as convenient as \label and \ref in LaTeX and even auto-generates the index of questions.
And I put it on an etherpad instance which is handy when multiple people are contributing questions to the FAQ.