Is this valid DTD? (official html 4.01 dtd) - html

Following declaration appears in html 4.01 dtds
<!ELEMENT STYLE - - %StyleSheet -- style info -->
(see http://www.w3.org/TR/REC-html40/sgml/dtd.html it's in both strict.dtd and loose.dtd)
Apparently, the ; is missing after %StyleSheet. The reference should have been %StyleSheet;
But this is the official dtd of the holy html - by far the most important dtd of all dtds - so what's going on there? Is it valid entity reference like that?

It is valid without the semicolon in HTML 4.01 DTDs. Here's an extract from the W3C's HTML 4.01 Specification - On SGML and HTML:
... Instances of parameter entities in a DTD begin with "%", then the parameter entity name, and terminated by an optional ";".
In an XHTML DTD it wouldn't be valid; they follow this recommendation (because XHTML is XML): Extensible Markup Language (XML) 1.0 (Fifth Edition) - Character and Entity References:
... Definition: Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters.

Related

HTML Semantics - <span> element and validity

Technically speaking would this block of code be valid (would a test say it's valid): <body><span>Some text</span></body> as opposed to <body><p><span>Some text</span></p></body> - which I know is valid
Yes that's html would be valid for both cases.
Take a look on HTML w3c Validator for verifrying whther html valid or not.
It is valid HTML5, which is the only HTML suitable for most practical needs today. However, formally it won't validate under outdated HTML 4 Strict and XHTML 1 Strict doctypes which required some block-level wrapper between body and text content.

HTML escape codes and differences between W3C HTML4 and HTML5 validator

The W3C HTML 4 fragment validator accepts this code:
<a href='http://www.sparql.org/sparql?query=+PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0A+SELECT+%3FAgent%0D%0A+FROM+%3Chttp%3A%2F%2Fwww.w3.org%2F2012%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fontomatica.com%2Fpublic%2Ftest%2F2_infotext.html%3E%0D%0A+WHERE%0D%0A+{%0D%0A+%3Fs+foaf%3AAgent+%3FAgent+.%0D%0A+}%0D%0A&default-graph-uri=&output=text&stylesheet=%2Fxml-to-html.xsl' title='Click here to query the page using SPARQL'><img src='http://www.example.com/public/bin/logo_sparql.png' alt='Run SPARQL query'/></a>
When the same code is in a document submitted to the W3C validator using the HTML5 template, it reports:
Bad value for attribute href on element a: Illegal character in query: not a URL code point.
at the location SPARQL'><img
To my eye there is no char that must be escaped (e.g. a pipe char).
What do I change so the element is accepted by the HTML5 validator?
If you escape { and } as %7B and %7D, it works.
<a href='http://www.sparql.org/sparql?query=+PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0A+SELECT+%3FAgent%0D%0A+FROM+%3Chttp%3A%2F%2Fwww.w3.org%2F2012%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fontomatica.com%2Fpublic%2Ftest%2F2_infotext.html%3E%0D%0A+WHERE%0D%0A+%7B%0D%0A+%3Fs+foaf%3AAgent+%3FAgent+.%0D%0A+%7B%0D%0A&default-graph-uri=&output=text&stylesheet=%2Fxml-to-html.xsl' title='Click here to query the page using SPARQL'><img src='http://www.example.com/public/bin/logo_sparql.png' alt='Run SPARQL query'/></a>

Validity of href attribute without a value

Is <a href>some text</a> a valid html code?
This one some text is valid.
What about the first one? Is that a typo or it is allowed to skip =""?
The markup <a href>some text</a> is valid (and equivalent to some text) according to HTML5 CR in HTML serialization, but not otherwise.
General HTML5 rules on HTML serialization (HTML syntax) allow empty attribute syntax: “Just the attribute name. The value is implicitly the empty string.” And the empty string is a valid URL, referring to the current document.
In XHTML, <a href>some text</a> is invalid and not even well-formed, since well-formedness rules (i.e., general XML syntax rules) require that an attribute specicification is of the form name="value" or name='value', with no shortcuts.
In earlier HTML specs, up to and including HTML 4.01, <a href>some text</a> is invalid but on other grounds. By the formal rules, an attribute value may never be omitted from an attribute specification, but the name and the equals sign may be omitted, if the attribute is declared with an enumerated set of values. So <a href>some text</a> would be valid if there were an attribute for a declared with enumerated values so that one of them is href (and there is only one such attribute). But there is no such attribute.
It depends on your doctype. More importantly, the rendering depends on your client's browser implementation. Chrome, FF, IE>7, etc, these browsers know what you meant, and can pick up the pieces just fine.
HTML5
<!DOCTYPE html>
The validator says:
Valid, but WARNING: Attribute href without an explicit value seen. The attribute may be dropped by IE7.
XHTML1.0 Strict and XHTML1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
and
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The validator says:
Invalid: "href" is not a member of a group specified for any attribute
HTML 4.01 Strict and HTML 4.01 Loose
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html>
and
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html>
The validator says:
Invalid: "HREF" is not a member of a group specified for any attribute
No it's not valid. Use:
<span class="pseudo-link">Some text</span>
and define styles in CSS for class pseudo-link to look like a normal link.
:hover selector will be important to change font color and underline the text when mouse is over it.
You can then define action for this pseudo-link with Javascript.

After ATTLIST declaration in DTD browser renders custom character

I declared rel="value" attribute for <li> element in DTD like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [<!ATTLIST li rel CDATA #IMPLIED>]>
After that my code with <li rel="value"></li> was valid, but I got another issuer: Browser renders "]>" character in document.
How to fix this?
You should not use an internal subset in a doctype declaration, because browsers do not understand it, or DTDs at all.
If you use a simple added attribute, for some reason, it is often best to just be careful enough with it, or “check it manually”. But to perform DTD-based validation, you would need to construct an external DTD, based on the DTD you wish to use as basis, and with the extra markup added into it. In this case, you would copy the HTML 4.01 Transitional DTD and replace
<!ATTLIST LI
%attrs;>
by
<!ATTLIST LI
rel CDATA #IMPLIED
%attrs;>
(That is, you need to provide the full list of allowed attributes, with your custom attribute added, instead of declarign an attribute list that only allows your attribute [unless that’s what you really want].)
You would then use a doctype declaration that refers to your modified copy by its URL, with
<!DOCTYPE HTML SYSTEM "dtdurl">
where dtdurl is an absolute URL for the DTD. More info: Creating your own DTD for HTML validation.
It is generally not advisable to add attributes of your own, as they may clash with attributes that might be added to HTML in some future version. According to HTML5 drafts, attributes with names starting with data- are meant for site-specific use and will never have any publicly defined meaning, so data-rel would be safer than rel.
Browsers don't understand embedded SGML. They simply stop reading the doctype at the first > character. So they see the following ]> as text to be rendered.
Just don't use embedded SGML.
Use a pseudo-attribute > delimiter instead of a literal > delimiter to escape the nested > within ]>:
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"
[<!ATTLIST li rel CDATA #IMPLIED>]>
References
Pseudo-Attributes

How to include HTML entities into an XML file

In firefox :
<?xml version="1.0" encoding="utf-8"?>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
results in "undefined entity" error.
I know there is something missing there. I just don't know what I should write to correct the problem. I would like to avoid rewriting every single unicode character into the document.
EDIT I tried the following, still not working, same error :
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math [
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;
]>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
EDIT In chrome, this results in the following message :
error on line 6 at column 13: PEReference: %HTMLlat1; not found
warning on line 10 at column 15: PEReference: %HTMLsymbol; not found
warning on line 14 at column 16: PEReference: %HTMLspecial; not found
EDIT Tried to download the .ent files and change the reference to either a local http:// path or file:/// path with no success.
A similar post about the subject : XML catalog in PHP
EDIT Quick workaround for browsers :
<!DOCTYPE html>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
You need to suppress the XML header, so it is understood as HTML.
Nevertheless, this doesn't answer the question, as the question was to import entities, while the document is declared as XML.
ANSWER
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
Note the definitions in XHTML1 and MathML2 are now obsolete and not aligned with the definitions that are built in to HTML parsers in current browsers. The current definitions as used in MathML3 and HTML5 are defined here
http://www.w3.org/2003/entities/2007doc/Overview.html
which is the editors (my:-) draft, with a link at the top to the REC version.
A single file set of DTD declarations for the entities is
http://www.w3.org/2003/entities/2007/htmlmathml-f.ent
generally speaking it is better to use numeric references rather than the named entities in an XML context as browsers will not fetch the externally referenced DTD.
Browsers following the HTML(5) spec will use a built in set of definitions derived from the above spec if you refer to the xhtml or mathml2 dtd via the public identifiers (ie they do not use the entity definitions that you specify).
see related bug against the HTML spec
https://www.w3.org/Bugs/Public/show_bug.cgi?id=13409
Add the MathML 2.0 doctype, after the XML declaration:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math
PUBLIC "-//W3C//DTD MathML 2.0//EN"
"http://www.w3.org/Math/DTD/mathml2/mathml2.dtd"
>
The reason is that handling of entity references is very kludgy in web browsers. They do not actually read DTDs. Instead, they have built-in tables of predefined entities, which can be turned on by using specific doctype strings. This is string magic, and e.g. using MathML 3.0 doctype will not work. Cf. to XML to XHTML using XSLT: using entities such as &Sum; (which is a MATHML entity) (especially Martin Honnen’s comment on an answer).
Alternatively, use characters as such or, if your authoring system cannot produce them conveniently, character references like ρ.
If you can modify the the XML to include an inline DTD you can define the entities there:
> <!DOCTYPE yourRootElement [
> <!ENTITY bull "•">
> .... ]>