In firefox :
<?xml version="1.0" encoding="utf-8"?>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
results in "undefined entity" error.
I know there is something missing there. I just don't know what I should write to correct the problem. I would like to avoid rewriting every single unicode character into the document.
EDIT I tried the following, still not working, same error :
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math [
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;
]>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
EDIT In chrome, this results in the following message :
error on line 6 at column 13: PEReference: %HTMLlat1; not found
warning on line 10 at column 15: PEReference: %HTMLsymbol; not found
warning on line 14 at column 16: PEReference: %HTMLspecial; not found
EDIT Tried to download the .ent files and change the reference to either a local http:// path or file:/// path with no success.
A similar post about the subject : XML catalog in PHP
EDIT Quick workaround for browsers :
<!DOCTYPE html>
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
You need to suppress the XML header, so it is understood as HTML.
Nevertheless, this doesn't answer the question, as the question was to import entities, while the document is declared as XML.
ANSWER
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
<mi>ρ</mi>
</math>
Note the definitions in XHTML1 and MathML2 are now obsolete and not aligned with the definitions that are built in to HTML parsers in current browsers. The current definitions as used in MathML3 and HTML5 are defined here
http://www.w3.org/2003/entities/2007doc/Overview.html
which is the editors (my:-) draft, with a link at the top to the REC version.
A single file set of DTD declarations for the entities is
http://www.w3.org/2003/entities/2007/htmlmathml-f.ent
generally speaking it is better to use numeric references rather than the named entities in an XML context as browsers will not fetch the externally referenced DTD.
Browsers following the HTML(5) spec will use a built in set of definitions derived from the above spec if you refer to the xhtml or mathml2 dtd via the public identifiers (ie they do not use the entity definitions that you specify).
see related bug against the HTML spec
https://www.w3.org/Bugs/Public/show_bug.cgi?id=13409
Add the MathML 2.0 doctype, after the XML declaration:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE math
PUBLIC "-//W3C//DTD MathML 2.0//EN"
"http://www.w3.org/Math/DTD/mathml2/mathml2.dtd"
>
The reason is that handling of entity references is very kludgy in web browsers. They do not actually read DTDs. Instead, they have built-in tables of predefined entities, which can be turned on by using specific doctype strings. This is string magic, and e.g. using MathML 3.0 doctype will not work. Cf. to XML to XHTML using XSLT: using entities such as ∑ (which is a MATHML entity) (especially Martin Honnen’s comment on an answer).
Alternatively, use characters as such or, if your authoring system cannot produce them conveniently, character references like ρ.
If you can modify the the XML to include an inline DTD you can define the entities there:
> <!DOCTYPE yourRootElement [
> <!ENTITY bull "•">
> .... ]>
Related
I want a client-side XSL-transformed document with elements targettable (jumpable to) by #foo (URL fragments). Problem is, as soon as I attach the simplest XSL stylesheet, Firefox stops scrolling to the elements. Here's simple code:
test.xml:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<!DOCTYPE foo [<!ATTLIST bar id ID #REQUIRED>]>
<foo xmlns:html='http://www.w3.org/1999/xhtml' xml:lang='en-GB'>
<html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/>
<bar id='baz'>Baf.</bar>
</foo>
test.xsl:
<xsl:stylesheet version='1.0' xmlns:html='http://www.w3.org/1999/xhtml' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match='/'>
<xsl:copy-of select='.'/>
</xsl:template>
</xsl:stylesheet>
As soon as I uncomment the stylesheet line, /test.xml#baz does nothing. As though the transformation somehow loses some data about elements' identification.
Any ideas? Thanks.
Well the XSLT/XPath data model does not include any DTD and thus your result tree that XSLT creates is a copy of the input without the DTD, thus there is no definition of any ID attributes in the result tree and Firefox has no way of establishing to which element with which attribute #some-id refers.
Usually if you use client-side XSLT in the browser the target format is (X)HTML or SVG or a mix of both where id attributes are known by the browser implementation without needing a DTD. If you want to transform to a result format unknown to the browser then I don't think there is a way to use DTDs for the result tree in Firefox/Mozilla. And I am not sure whether they ever implemented xml:id support so that you could use that instead of defining your own ID attributes.
Martin Honnen's mention of XHTML resulted in experimentation during which I found out that setting the target element's namespace to XHTML's, xmlns='http://www.w3.org/1999/xhtml', does the trick. It doesn't seem very clean, but it doesn't seem as grave as, for instance, setting the whole doctype to XHTML's. So text.xml is now:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href='test.xsl'?>
<foo xmlns:html='http://www.w3.org/1999/xhtml' xml:lang='en-GB'>
<html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/><html:br/>
<html:bar id='baz'>Baf.</html:bar>
</foo>
Also relevant might be http://xmlplease.com/xhtmlxhtml I found.
Thanks, all.
I declared rel="value" attribute for <li> element in DTD like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [<!ATTLIST li rel CDATA #IMPLIED>]>
After that my code with <li rel="value"></li> was valid, but I got another issuer: Browser renders "]>" character in document.
How to fix this?
You should not use an internal subset in a doctype declaration, because browsers do not understand it, or DTDs at all.
If you use a simple added attribute, for some reason, it is often best to just be careful enough with it, or “check it manually”. But to perform DTD-based validation, you would need to construct an external DTD, based on the DTD you wish to use as basis, and with the extra markup added into it. In this case, you would copy the HTML 4.01 Transitional DTD and replace
<!ATTLIST LI
%attrs;>
by
<!ATTLIST LI
rel CDATA #IMPLIED
%attrs;>
(That is, you need to provide the full list of allowed attributes, with your custom attribute added, instead of declarign an attribute list that only allows your attribute [unless that’s what you really want].)
You would then use a doctype declaration that refers to your modified copy by its URL, with
<!DOCTYPE HTML SYSTEM "dtdurl">
where dtdurl is an absolute URL for the DTD. More info: Creating your own DTD for HTML validation.
It is generally not advisable to add attributes of your own, as they may clash with attributes that might be added to HTML in some future version. According to HTML5 drafts, attributes with names starting with data- are meant for site-specific use and will never have any publicly defined meaning, so data-rel would be safer than rel.
Browsers don't understand embedded SGML. They simply stop reading the doctype at the first > character. So they see the following ]> as text to be rendered.
Just don't use embedded SGML.
Use a pseudo-attribute > delimiter instead of a literal > delimiter to escape the nested > within ]>:
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"
[<!ATTLIST li rel CDATA #IMPLIED>]>
References
Pseudo-Attributes
I am trying to transform some HTML files to my own XML-format via XSL.
For this purpose I use HTML Tidy to clean up the input files, then transform them to xhtml with html2xhtml and then use a xsl script with msxsl to transform the xhtml files to my own format.
However, the last step is failing with not a error message at all (it is a semantical fail; not a technical ;-)): My output file just contains empty tags.
I had a problem like this before and removed the xmlns attribute from the html tag, what causes nearly all of the online transformers to work with my files correctly. MSXSL now writes the following error message: "Use of default namespace declaration attribute in DTD not supported".
Find the files I use here: http://pastie.org/5483087
Thank you in advance!
Well that is the FAQ with XSLT and XPath 1.0, the elements in your input XHTML document are in a namespace and your XSLT does not take that into account. You need to change it to e.g.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">
<xsl:template match="/">
<stellenausschreibung>
<hochschule><xsl:value-of select="//xhtml:div[#id='contentText']/xhtml:img/#alt" /></hochschule>
<anbieter><xsl:value-of select="//xhtml:p[#id='ad_employer']" /></anbieter>
<typ><xsl:value-of select="//xhtml:h1" /></typ>
<bewerbungsschluss><xsl:value-of select="//xhtml:span[#id='ad_bewerbungsschluss']" /></bewerbungsschluss>
<erscheinungsdatum><xsl:value-of select="//xhtml:span[#class='job_published_at']" /></erscheinungsdatum>
<inhalt><xsl:value-of select="//xhtml:p[#id='ad_job']" /></inhalt>
</stellenausschreibung>
</xsl:template>
</xsl:stylesheet>
The prefix (in my example xhtml) for the XHTML namespace used in the stylesheet can of course be freely chosen but it is necessary to use one as with XSLT/XPath 1.0 a path of e.g. //p always selects p elements in no namespace.
I am trying to convert an HTML file into XML file using XSLT (Using Oxygen 9.0 for transformation).
When I configure and run the XSLT transformation with the HTML file then Oxygen outputs
The entity 'nbsp' was referenced,but not declared.
My input html file is:
<div><span> some text</span></div>
Note: I want to know how handle that entity only using the XSLT, I don't want to make any changes to the input file.
You could use XML Entities to create an XML file that defines the nbsp entity, and includes the (broken) XML fragment.
For example, assume that your fragment is saved as a file called: "invalid.xml"
<div><span> some text</span></div>
Create an XML file like this:
<!DOCTYPE wrapper [
<!ENTITY nbsp " ">
<!ENTITY invalid-xml-document SYSTEM "./invalid.xml">
]><wrapper>
&invalid-xml-document;</wrapper>
When it that file gets parsed, it will have defined the nbsp entity, include the content from the "invalid.xml", and resolve the nbsp entity properly. The result is this:
<wrapper>
<div>
<span> some text</span>
</div>
</wrapper>
Then, just adjust your XSLT to accomodate the new document element (in this example the element <wrapper>).
As far as I know, you're going to need to make changes to the input file.
Either by changing your to or by declaring a custom doctype that will do the conversion for you:
<!DOCTYPE doctypeName [
<!ENTITY nbsp " ">
]>
This is because isn't one of XMLs predefined entities.
Following declaration appears in html 4.01 dtds
<!ELEMENT STYLE - - %StyleSheet -- style info -->
(see http://www.w3.org/TR/REC-html40/sgml/dtd.html it's in both strict.dtd and loose.dtd)
Apparently, the ; is missing after %StyleSheet. The reference should have been %StyleSheet;
But this is the official dtd of the holy html - by far the most important dtd of all dtds - so what's going on there? Is it valid entity reference like that?
It is valid without the semicolon in HTML 4.01 DTDs. Here's an extract from the W3C's HTML 4.01 Specification - On SGML and HTML:
... Instances of parameter entities in a DTD begin with "%", then the parameter entity name, and terminated by an optional ";".
In an XHTML DTD it wouldn't be valid; they follow this recommendation (because XHTML is XML): Extensible Markup Language (XML) 1.0 (Fifth Edition) - Character and Entity References:
... Definition: Parameter-entity references use percent-sign (%) and semicolon (;) as delimiters.