XSLT - HTML id attribute without quotes <div id=myId> - html

For my output HTML file, I have to produce a div element with an id attribute, but the value of the attribute shouldn't stand in quotes, just like in this example: <div id=myID>...</div>. Everything what I want to have, works perfectly when I use quotes, like here: <div class="myClass" id="{$myIdVariable}">...</div>. Is it possible to tell Oxygen or Saxon to ignore such cases? But at the end I'm using the java javax.xml.transform package, where I'm not aware of, if I can tell my classes I use to ignore things like that. I would be very glad, if someone has a good solution for this problem, or even could tell me, that this is not possible by using XSLT...

I believe your title should read without quotes, "", not without parenthesis, ().
No, XSLT is not going to help you create XML that's not well-formed. (You could stand on your head and output text rather than XML to achieve such a effect, but don't do that.) Attribute values must have single, ', or double quote, ", delimiters for the markup to be XML. Even the HTML output option is not going to serialize attribute values without quote delimiters.
In comments, #Ole asks:
In principle you are right, but I thought that in HTML5, also attributes without quotes are allowed?
Yes, in HTML5, unquoted attribute values are allowed, but you'll be better off using the single-quoted and double-quoted attribute value syntaxes that are also supported in HTML5, especially if you want to be able to leverage XML tools.

Related

Is there any difference for data-attribute=false with data-attribute="false" in html element?

I have data attribute in html element as <button data-verified=false>Update</button>. It have boolean value for data attribute.
Is there any difference with following element <button data-verified="false">Update</button> as the data-attribute is wrapped with double quotes.
Is boolean values are supported in html?
Boolean attributes are supported in HTML, but data-verified isn't one of them, no matter how it appears in the markup. data-verified=false and data-verified="false" both create an attribute of the type string and value "false", which if tested in JS as a boolean will be treated as true
This is only the case because false doesn't contain spaces. As a contrary example, data-verified=not true is invalid and not at all the same as data-verified="not true"
There are no differences in the values - however, always prefer to quote around attribute values, because:
Looks cleaner
Easier to maintain
Every editor can deal with it easily
It's a standard, nearly all HTML code examples you'll see use the value quoted
My answer corroborates from Do you quote HTML5 attributes?
I think it is just a convention that attributes always have double quotes.
However. In jQuery, you can use the .data() method. It is smart enough to recognize booleans and numeric values.
The only difference is that only the latter is allowed in XHTML. In HTML syntax, they both are allowed, and they are equivalent: the difference is lost when the HTML markup is parsed, and the DOM contains in both cases only the string false.
This follows from general principles in HTML and does not depend on the name of the attribute in any way.
“Boolean value” is a vague term. In HTML5, some attributes are called “boolean attributes”, but this is strongly misleading – especially since values true and false, far from being the only values allowed, aren’t allowed at all for such values. You need to read the specification of “boolean attributes” to see what they really are.
When you use data-* attributes, it is completely up to you what you use as values and how you process them.

quoting HTML attribute values

I know the spec allows both ' and " as delimiters for attribute values, and I also know it's a good practice to always quote.
However I consider " being the cleaner way, maybe it's just me having grown up with C and C++' syntax.
What is the cleanest way of quoting attribute values and why? Please no subjective answers.
Both are fine, but Double quotes are better (IMHO) as you reduce the risk of dynamic values causing errors. e.g.
<input value='${lastName}'/>
<input value='O'Graddy'/>
^^^^^^^
vs.
<input value="${lastName}"/>
<input value="O'Graddy"/>
There’s a lot of rules to remember if you want to omit quotes around attribute values. It’s probably easiest to just use quotes consistently; it avoids all kinds of problems.
If you’re interested, I did some research on unquoted attribute values in HTML, CSS and JavaScript a while ago, and wrote about it here: http://mathiasbynens.be/notes/unquoted-attribute-values
I’ve also created a tool that will tell you if a value you enter is a valid unquoted attribute value or not: http://mothereffingunquotedattributes.com/#foo%7Cbar
Either is good, as long as you use it. " is more popular.

Single vs Double quotes (' vs ")

I've always used single quotes when writing my HTML by hand. I work with a lot of rendered HTML which always uses double quotes. This allows me to determine if the HTML was written by hand or generated. Is this a good idea?
What is the difference between the two? I know they both work and are supported by all modern browsers but is there a real difference where one is actually better than the other in different situations?
The w3 org said:
By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (") and single quotes ('). For double quotes authors can also use the character entity reference ".
So... seems to be no difference. Only depends on your style.
I use " as a top-tier and ' as a second tier, as I imagine most people do. For example
Click Me!
In that example, you must use both, it is unavoidable.
Quoting Conventions for Web Developers
The Short Answer
In HTML the use of single quotes (') and double quotes (") are interchangeable, there is no difference.
But consistency is recommended, therefore we must pick a syntax convention and use it regularly.
The Long Answer
Web Development often consists of many programming languages. HTML, JS, CSS, PHP, ASP, RoR, Python, etc. Because of this we have many syntax conventions for different programming languages. Often habits from one language will follow us to other languages, even if it is not considered "proper" i.e. commenting conventions. Quoting conventions also falls into this category for me.
But I tend to use HTML tightly in conjunction with PHP. And in PHP there is a major difference between single quotes and double quotes. In PHP with double quotes "you can insert variables directly within the text of the string". (scriptingok.com) And when using single quotes "the text appears as it is". (scriptingok.com)
PHP takes longer to process double quoted strings. Since the PHP parser has to read the whole string in advance to detect any variable inside—and concatenate it—it takes longer to process than a single quoted string. (scriptingok.com)
 
Single quotes are easier on the server. Since PHP does not need to read the whole string in advance, the server can work faster and happier. (scriptingok.com)
Other things to consider
Frequency of double quotes within string. I find that I need to use double quotes (") within my strings more often than I need to use single quotes (') within strings. To reduce the number of character escapes needed I favor single quote delimiters.
It's easier to make a single quote. This is fairly self explanatory but to clarify, why press the SHIFT key more times than you have to.
My Convention
With this understanding of PHP I have set the convention (for myself and the rest of my company) that strings are to be represented as single quotes by default for server optimization. Double quotes are used within the string if a quotes are required such as JavaScript within an attribute, for example:
<button onClick='func("param");'>Press Me</button>
Of course if we are in PHP and want the parser to handle PHP variables within the string we should intentionally use double quotes. $a='Awesome'; $b = "Not $a";
Sources
Single quotes vs Double quotes in PHP. (n.d.). Retrieved November 26, 2014, from http://www.scriptingok.com/tutorial/Single-quotes-vs-double-quotes-in-PHP
If it's all the same, perhaps using single-quotes is better since it doesn't require holding down the shift key. Fewer keystrokes == less chance of repetitive strain injury.
Actually, the best way is the way Google recommends. Double quotes:
https://google.github.io/styleguide/htmlcssguide.xml?showone=HTML_Quotation_Marks#HTML_Quotation_Marks
See https://google.github.io/styleguide/htmlcssguide.xml?showone=HTML_Validity#HTML_Validity
Quoted Advice from Google: "Using valid HTML is a measurable baseline quality attribute that contributes to learning about technical requirements and constraints, and that ensures proper HTML usage."
In HTML I don't believe it matters whether you use " or ', but it should be used consistently throughout the document.
My own usage prefers that attributes/html use ", whereas all javascript uses ' instead.
This makes it slightly easier, for me, to read and check. If your use makes more sense for you than mine would, there's no need for change. But, to me, your code would feel messy. It's personal is all.
Using double quotes for HTML
i.e.
<div class="colorFont"></div>
Using single quotes for JavaScript
i.e.
$('#container').addClass('colorFont');
$('<div class="colorFont2></div>');
I know LOTS of people wouldn't agree, but this is what I do and I really enjoy such a coding style: I actually don't use any quote in HTML unless it is absolutely necessary.
Example:
<form method=post action=#>
<fieldset>
<legend>Register here: </legend>
<label for=account>Account: </label>
<input id=account type=text name=account required><br>
<label for=password>Password: </label>
<input id=password type=password name=password required><br>
...
Double quotes are used only when there are spaces in the attribute values or whatever:
<form class="val1 val2 val3" method=post action=#>
...
</form>
I had an issue with Bootstrap where I had to use double quotes as single quotes didn't work.
class='row-fluid' made the last <span> fall below the other <span>s, rather than sitting nicely beside them on the far right. class="row-fluid" worked.
It makes no difference to the html but if you are generating html dynamically with another programming language then one way may be easier than another.
For example in Java the double quote is used to indicate the start and end of a String, so if you want to include a doublequote within the String you have to escape it with a backslash.
String s = "a Link"
You don't have such a problem with the single quote, therefore use of the single quote makes for more readable code in Java.
String s = "<a href='link'>a Link</a>"
Especially if you have to write html elements with many attributes.(Note I usually use a library such as jhtml to write html in Java, but not always practical to do so)
if you are writing asp.net then occasionally you have to use double quotes in Eval statements and single quotes for delimiting the values - this is mainly so that the C# inline code knows its using a string in the eval container rather than a character. Personally I'd only use one or the other as a standard and not mix them, it looks messy thats all.
Using " instead of ' when:
<input value="user"/> //Standard html
<input value="user's choice"/> //Need to use single quote
<input onclick="alert('hi')"/> //When giving string as parameter for javascript function
Using ' instead of " when:
<input value='"User"'/> //Need to use double quote
var html = "<input name='username'/>" //When assigning html content to a javascript variable
I'm newbie here but I use single quote mark only when I use double quote mark inside the first one. If I'm not clear I show You example:
<p align="center" title='One quote mark at the beginning so now I can
"cite".'> ... </p>
I hope I helped.
Lots of great insightful replies here! More than enough for anyone to make a clear and personal decision.
I would simply like to point out one thing that's always mattered to me.
And take this with a grain of salt!
Double quotes apply to strings that have more than a single phase such as "one two" rather than single quotes for 'one' or 'two'. This can be traced as far back as C and C++.
(reference here or do your own online search).
And that's truly the difference.
With this principle (this different), parsing became possible such as "{{'a','b'},{'x','y'}} or "/[^\r\n]*[\r\n]" (which needed to be space independent because it's expressional) or more famously for HTML specific title = "Hello HTML!" or style = "font-family:arial; color:#FF0000;"
The funny thing here is that HTML (coming from XML itself) commonly adopted double quotes due to expressional features even if it is a single character (e.g. number) or single phase string.
As NibblyPig pointed out quite well and straightforward:
" as a top-tier and ' as a second tier since "'a string here'" is valid and expected by W3 standards (which is for the web) and will most likely never change.
And for consistency, double quotes is wisely used, but only fully correct by preference.
In PHP using double quotes causes a slight decrease in performance because variable names are evaluated, so in practice, I always use single quotes when writing code:
echo "This will print you the value of $this_variable!";
echo 'This will literally say $this_variable with no evaluation.';
So you can write this instead;
echo 'This will show ' . $this_variable . '!';
I believe Javascript functions similarly, so a very tiny improvement in performance, if that matters to you.
Additionally, if you look all the way down to HTML spec 2.0, all the tags listed here;
W3 HTML DTD Reference
(Use doublequotes.) Consistency is important no matter which you tend to use more often.
Double quotes are used for strings (i.e., "this is a string") and single quotes are used for a character (i.e., 'a', 'b' or 'c'). Depending on the programming language and context, you can get away with using double quotes for a character but not single quotes for a string.
HTML doesn't care about which one you use. However, if you're writing HTML inside a PHP script, you should stick with double quotes as you will need to escape them (i.e., \"whatever\") to avoid confusing yourself and PHP.

escaping html inside comment tags

escaping html is fine - it will remove <'s and >'s etc.
ive run into a problem where i am outputting a filename inside a comment tag eg. <!-- ${filename} -->
of course things can be bad if you dont escape, so it becomes:
<!-- <c:out value="${filename}"/> -->
the problem is that if the file has "--" in the name, all the html gets screwed, since youre not allowed to have <!-- -- -->.
the standard html escape doesnt escape these dashes, and i was wondering if anyone is familiar with a simple / standard way to escape them.
Definition of a HTML comment:
A comment declaration starts with <!, followed by zero or more comments, followed by >. A comment starts and ends with "--", and does not contain any occurrence of "--".
Of course the parsing of a comment is up to the browser.
Nothing strikes me as an obvious solution here, so I'd suggest you str_replace those double dashes out.
There is no good way to solve this. You can't just escape them because comments are read in plaintext. You will have to do something like put a space between the hyphens, or use some sort of code for hyphens (like [HYPHEN]).
Since it is obvoius that you cannnot directly display the '--'s you can either encode them or use the fn:escapeXml or fn:replace tags for appropriate replacements.
JSTL documentation
There's no universal working way to escape those characters in html unless the - characters are in multiples of four so if you do -- it wont work in firefox but ---- will work. So it all depends on the browser. For Example, looking at Internet Explorer 8, it is not a problem, those characters are escaped properly. The same goes for Googles Chrome... However Firefox even the latest browser (3.0.4), it doesn't handle escaping of these characters well.
You shouldn't be trying to HTML-escape, the contents of comments are not escapable and it's fine to have a bare ‘>’ or ‘&’ inside.
‘--’ is its own, unrelated problem and is not really fixable. If you don't need to recover the exact string, just do a replacement to get rid of them (eg. replace with ‘__’).
If you do need to get a string through completely unmolested to a JavaScript that will be reading the contents of the comment, use a string literal:
<!-- 'my-string' -->
which the script can then read using eval(commentnode.data). (Yes, a valid use for eval() at last!)
Then your escaping problem becomes how to put things in JS string literals, which is fairly easily solvable by escaping the ‘'’ and ‘-’ characters:
<!-- 'Bob\x27s\x2D\x2Dstring' -->
(You should probably also escape ‘<’, ‘&’ and ‘"’, in case you ever want to use the same escaping scheme to put a JS string literal inside a <​script> block or inline handler.)

How can I remove an entire HTML tag (and its contents) by its class using a regex?

I am not very good with Regex but I am learning.
I would like to remove some html tag by the class name. This is what I have so far :
<div class="footer".*?>(.*?)</div>
The first .*? is because it might contain other attribute and the second is it might contain other html stuff.
What am I doing wrong? I have try a lot of set without success.
Update
Inside the DIV it can contain multiple line and I am playing with Perl regex.
As other people said, HTML is notoriously tricky to deal with using regexes, and a DOM approach might be better. E.g.:
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file( 'yourdocument.html' );
for my $node ( $tree->findnodes( '//*[#class="footer"]' ) ) {
$node->replace_with_content; # delete element, but not the children
}
print $tree->as_HTML;
You will also want to allow for other things before class in the div tag
<div[^>]*class="footer"[^>]*>(.*?)</div>
Also, go case-insensitive. You may need to escape things like the quotes, or the slash in the closing tag. What context are you doing this in?
Also note that HTML parsing with regular expressions can be very nasty, depending on the input. A good point is brought up in an answer below - suppose you have a structure like:
<div>
<div class="footer">
<div>Hi!</div>
</div>
</div>
Trying to build a regex for that is a recipe for disaster. Your best bet is to load the document into a DOM, and perform manipulations on that.
Pseudocode that should map closely to XML::DOM:
document = //load document
divs = document.getElementsByTagName("div");
for(div in divs) {
if(div.getAttributes["class"] == "footer") {
parent = div.getParent();
for(child in div.getChildren()) {
// filter attribute types?
parent.insertBefore(div, child);
}
parent.removeChild(div);
}
}
Here is a perl library, HTML::DOM, and another, XML::DOM
.NET has built-in libraries to handle dom parsing.
In Perl you need the /s modifier, otherwise the dot won't match a newline.
That said, using a proper HTML or XML parser to remove unwanted parts of a HTML file is much more appropriate.
<div[^>]*class="footer"[^>]*>(.*?)</div>
Worked for me, but needed to use backslashes before special characters
<div[^>]*class=\"footer\"[^>]*>(.*?)<\/div>
Partly depends on the exact regex engine you are using - which language etc. But one possibility is that you need to escape the quotes and/or the forward slash. You might also want to make it case insensitive.
<div class=\"footer\".*?>(.*?)<\/div>
Otherwise please say what language/platform you are using - .NET, java, perl ...
Try this:
<([^\s]+).*?class="footer".*?>([.\n]*?)</([^\s]+)>
Your biggest problem is going to be nested tags. For example:
<div class="footer"><b></b></div>
The regexp given would match everything through the </b>, leaving the </div> dangling on the end. You will have to either assume that the tag you're looking for has no nested elements, or you will need to use some sort of parser from HTML to DOM and an XPath query to remove an entire sub-tree.
This will be tricky because of the greediness of regular expressions, (Note that my examples may be specific to perl, but I know that greediness is a general issue with REs.) The second .*? will match as much as possible before the </div>, so if you have the following:
<div class="SomethingElse"><div class="footer"> stuff </div></div>
The expression will match:
<div class="footer"> stuff </div></div>
which is not likely what you want.
why not <div class="footer".*?</div> I'm not a regex guru either, but I don't think you need to specify that last bracket for your open div tag