Quotation marks in HTML attribute values? - html

This may seem like a realy basic question but...
How do you use double speech marks in HTML code (alt tags and the such)?
For example..
I'm trying to set a tag in my webpage to Opening Credits for "It's Liverpool" but it's limiting it to Opening Credits for.

You'll want to use the corresponding HTML entity in place of the quotes:
<span alt="Opening Credits for "It's Liverpool"">A span</span>

You can normally avoid the issue by using appropriate language-dependent quotation marks, instead of Ascii quotation marks, which should be confined to use as delimiters in computer code. Example:
alt="Opening Credits for “It’s Liverpool”"
or (in British English)
alt="Opening Credits for ‘It’s Liverpool’"
Should you really need to use Ascii quotation marks inside an attribute value, use Ascii apostrophes as delimiters:
alt='The statement foo = "bar" is an assignment.'
In the extremely rare case where an attribute value really needs to contain both an Ascii quotation mark and an Ascii apostrophe, you need to escape either of them (namely the one you decide to use as attribute value delimiter):
alt="The Ascii characters " and ' should not be used in natural languages."
or
alt='The Ascii characters " and ' should not be used in natural languages.'
Note that these considerations are relevant only inside attribute values. In element content, both " and ' can be used freely:
<strong>The Ascii characters " and ' should not be used in natural languages.</strong>

Related

Why is `'` escaped in html libs?

With HTML I notice some libraries escape '. My question is why? The first time I thought maybe they did it just because but I seen more then one do it but not all. I can't remember what I looked at from the top of my head but the others i remember were &, <, >, ".
I know & is used for escape characters (such as to make & which is &). < and > are escaped to not be confused for start/end tags and " is done so you can put " in tag attributes if you need to for some reason. But why '? Also am I missing any other characters that should be escaped?
Because under HTML, the single-quote character ' can be used to delimit element attributes instead of the double-quote, like so:
<p class='something'></p>
However the character does not need to be escaped normally, but it's best to be safe.
In HTML, " and ' are interchangeable. Both can be used for setting the attribute for an element as well as used for denoting a string in JavaScript:
<img src="bob.png" />
<img src='bob.png' />
The single-quote mark is escaped because it can only be used as-is in certain contexts. When writing an escaping function, it is easier and faster to just always escape it, so you don't have to take into account the context.
For example, if you use double quotes " to denote an attribute value, you can use a single quote ' within it safely. However, if you use single quotes to denote the attribute value, you cannot.

Newlines and special characters in HTML attributes

My questions are simple:
Is the following valid? If it is, would it break in some browsers?
<div data-text="Blah blah blah
More blah
And just a little extra blah to finish"> ... </div>
Which characters "must" be encoded in attribute values? I know " should be ", but are any others required to be encoded?
Is the following valid?
It's a valid fragment of HTML5, yes.
would it break in some browsers?
Unlikely.
Which characters "must" be encoded in attribute values? I know " should be ", but are any others required to be encoded?
That depends on whether the attribute value is double quoted, single quoted or unquoted.
For the double quoted form " must be replaced by its character reference, and & may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-double-quoted-state
For the single quoted form ' must be replaced by its character reference, and & may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-single-quoted-state
For the unquoted form TAB, LINEFEED, FORMFEED, SPACE, > must be replaced by their character references, and & may need to be replaced by its character reference depending on the characters that follow it. See attribute-value-unquoted-state
HTML 5 spec
There are different requirements for different attributes so there isn't one answer.
For instance, title attributes allow lines feeds, but a class attribute is a space seperated line of string tokens.
For data elements though the spec says of the namespace:
contains no characters in the range U+0041 to U+005A (LATIN CAPITAL LETTER A to LATIN CAPITAL LETTER Z).
Other than that, it doesn't make any distinctions.

What to use " or ' when coding [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
When did single quotes in HTML become so popular?
Should I use ' or " when coding, for example width='100px' or width="100px". Is it only a matter of taste, or does it matter to the browsers?
The reason why I ask, I have always used "" for everything, so when I code with PHP, I have to escape like this:
echo "<table width=\"100px\>"";
But I've found out that I will probably save 2 minutes per day if I do this:
echo "<table width='100px'>"
Fewer key strokes. Of course I could also do this:
echo '<table width="100px">'
What should the HTML look like; 'option1' or "option2"?
Yes, it's a matter of taste. It makes no difference in HTML. Quoting the W3C on SGML and HMTL:
By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa.
...
In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.
However note that the width attribute is deprecated, even though it is still supported in all major browsers. Actually the width attribute is not deprecated when used in <table> in HTML 4.01. Only when used in <hr>, <pre>, <td>, <th> (Source 1, 2, 3, 4).
HTML5 supports attributes specified in any of these four different ways:
Empty attribute syntax: <input disabled>
Unquoted attribute value syntax: <input value=yes>
Single-quoted attribute value syntax: <input type='checkbox'>
Double-quoted attribute value syntax: <input name="be evil">
For me it is better to use ' than "". PHP will parse the string wrap with "" and using ' is faster because PHP will treat it just a string and no more parsing.
I agree with Daniel - it's a matter of taste.
Personally, I use double quotes when assigning values to attributes (e.g. width="100%"). Also, if you deal with JSON, strictly speaking, the names and values should be surrounded with double quotes ("").
echo '<table width="100px">'
Because other HTML tags are probably with double quotes, keep it clean, you don't need to escape quotes this way.

Why shouldn't `&apos;` be used to escape single quotes?

As stated in, When did single quotes in HTML become so popular? and Jquery embedded quote in attribute, the Wikipedia entry on HTML says the following:
The single-quote character ('), when used to quote an attribute value, must also be escaped as ' or ' (should NOT be escaped as &apos; except in XHTML documents) when it appears within the attribute value itself.
Why shouldn't &apos; be used? Also, is " safe to be used instead of "?
" is on the official list of valid HTML 4 entities, but &apos; is not.
From C.16. The Named Character Reference &apos;:
The named character reference &apos;
(the apostrophe, U+0027) was
introduced in XML 1.0 but does not
appear in HTML. Authors should
therefore use ' instead of
&apos; to work as expected in HTML 4
user agents.
" is valid in both HTML5 and HTML4.
&apos; is valid in HTML5, but not HTML4. However, most browsers support &apos; for HTML4 anyway.
&apos; is not part of the HTML 4 standard.
" is, though, so is fine to use.
If you need to write semantically correct mark-up, even in HTML5, you must not use &apos; to escape single quotes. Although, I can imagine you actually meant apostrophe rather then single quote.
single quotes and apostrophes are not the same, semantically, although they might look the same.
Here's one apostrophe.
Use ' to insert it if you need HTML4 support. (edited)
In British English, single quotes are used like this:
"He told me to 'give it a try'", I said.
Quotes come in pairs. You can use:
<p><q>He told me to <q>give it a try</q></q>, I said.<p>
to have nested quotes in a semantically correct way, deferring the substitution of the actual characters to the rendering engine. This substitution can then be affected by CSS rules, like:
q {
quotes: '"' '"' '<' '>';
}
An old but seemingly still relevant article about semantically correct mark-up: The Trouble With EM ’n EN (and Other Shady Characters).
(edited) This used to be:
Use ’ to insert it if you need HTML4 support.
But, as #James_pic pointed out, that is not the straight single quote, but the "Single curved quote, right".
If you really need single quotes, apostrophes, you can use
html | numeric | hex
‘ | ‘ | ‘ // for the left/beginning single-quote and
’ | ’ | ’ // for the right/ending single-quote

HTML code for an apostrophe

Seemingly simple, but I cannot find anything relevant on the web.
What is the correct HTML code for an apostrophe? Is it ’?
If you are looking for straight apostrophe ' (U+00027), it is
' or &apos; (latest is HTLM 5 only)
If you are looking for the curly apostrophe ’ (U+02019), then yes, it is
’ or ’
As of to know which one to use, there are great answers in the Graphic Design community: What’s the right character for an apostrophe?.
A List Apart has a nice reference on characters and typography in HTML. According to that article, the correct HTML entity for the apostrophe is ’. Example use: ’ .
It's &apos;.
As noted by msanders, this is actually XML and XHTML but not defined in HTML4, so I guess use the ' in that case. I stand corrected.
A standard-compliant, easy-to-remember set of html quotes, starting with the right single-quote which is normally used as an apostrophe:
right single-quote — ’ — ’
left single-quote — ‘ — ‘
right double-quote — ” — ”
left double-quote — “ — “
Depends on which apostrophe you are talking about: there’s &apos;, ‘, ’ and probably numerous other ones, depending on the context and the language you’re intending to write. And with a declared character encoding of e.g. UTF-8 you can also write them directly into your HTML: ', ‘, ’.
Firstly, it would appear that &apos; should be avoided -
The curse of &apos;
Secondly, if there is ever any chance that you're going to generate markup to be returned via AJAX calls, you should avoid the entity names (As not all of the HTML entities are valid in XML) and use the &#XXXX; syntax instead.
Failure to do so may result in the markup being considered as invalid XML.
The entity that is most likely to be affected by this is , which should be replaced by  
Here is a great reference for HTML Ascii codes:
http://www.ascii.cl/htmlcodes.htm
The code you are looking for is: '
Note that &apos; IS defined in HTML5, so for modern websites, I would advise using &apos; as it is much more readable than '
Check: http://www.w3.org/TR/html5/syntax.html#named-character-references
Even though &apos; reads nicer than ' and it's a shame not to use it, as a fail-safe, use '.
&apos; is a valid HTML 5 entity, however it is not a valid HTML 4 entity.
Unless <!DOCTYPE html> is at the top of your HTML document, use '
Sorry if this offends anyone, but there is a reasonable article on Ted Clancy's blog that argues against the Unicode committee's recommendation to use ’ (RIGHT SINGLE QUOTATION MARK) and proposes using U+02BC (MODIFIER LETTER APOSTROPHE) (aka ʼ or ʼ) instead.
In a nutshell, the article argues that:
A punctuation mark (such as a quotation mark) normally separates words and phrases, while the sides of a contraction really can't be separated and still make sense.
Using a modifier allows one to select a contraction with the regular expression \w+
It's easier to parse quotes embedded in text if there aren't quotation marks also appearing in contractions
' in decimal.
%27 in hex.
Although the &apos; entity may be supported in HTML5, it looks like a typewriter apostrophe. It looks nothing like a real curly apostrophe—which looks identical to an ending quotation mark: ’.
Just look when I write them after each other:
1: right single quotation mark entity, 2: apostrophe entity: ’ &apos;.
I tried to find a proper entity or alt command specifically for a normal looking apostrophe (which again, looks ‘identical’ to a closing right single quotation mark), but I haven’t found one. I always need to insert a right single quotation mark in order to get the visually correct apostrophe.
If you use just ’ (ALT + 0146) or autoformat typewriter apostrophes and quotation marks as curly in a word processor like Word 2013, do use <meta charset="UTF-8">.
I've found FileFormat.info's Unicode Character Search to be most helpful in finding exact character codes.
Entering simply ' (the character to the left of the return key on my US Mac keyboard) into their search yields several results of various curls and languages.
I would presume the original question was asking for the typographically correct U+02BC ʼ, rather than the typewriter fascimile U+0027 '.
The W3C recommends hex codes for HTML entities (see below). For U+02BC that would be ʼ, rather than ' for U+0027.
http://www.w3.org/International/questions/qa-escapes
Using character escapes in markup and CSS
Hex vs. decimal. Typically when the Unicode Standard refers to or lists characters it does so using a hexadecimal value. … Given the prevalence of this convention, it is often useful, though not required, to use hexadecimal numeric values in escapes rather than decimal values…
http://www.w3.org/TR/html4/charset.html
5 HTML Document Representation … 5.4 Undisplayable characters
…If missing characters are presented using their numeric representation, use the hexadecimal (not decimal) form, since this is the form used in character set standards.
Just a one more link with a nicely maintained collection Html Entities (archived), and its current (2023-01-22) status Named Character References.
As far as I know it is ' but it seems yours works as well
See http://w3schools.com/tags/ref_ascii.asp
Use &apos; for a straight apostrophe. This tends to be more readable than the numeric ' (if others are ever likely to read the HTML directly).
Edit: msanders points out that &apos; isn't valid HTML4, which I didn't know, so follow most other answers and use '.
You can try ' as seen in http://unicodinator.com/#0027