XML or HTML? Help needed - html

I am new to coding. I got a file called index.html. The first 2 lines in it are given below.
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
Is it an html or xml file?

HTML 5 defines an XML serialisation. Previously, XHTML 1 defined an implementation of HTML 4 in XML.
It is XML. It is also either HTML or XHTML.

Nice Question, First of all the first line is pure XML declarationXML Declaration, the second line is of html, and it contains meta data.

Related

Parsing an XML file with multiple <?xml> tags using Node.js/Express/xml2js

my problem is as follows:
I'm downloading an xml file using express.js and then parsing that file. Right now it looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE item [ ]>
<item lang="EN" >
<country>US</country>
<doc-number>123123123</doc-number>
<kind>A1</kind>
<date>20191017</date>
</item>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE item [ ]>
<item lang="EN" >
<country>US</country>
<doc-number>0938409384</doc-number>
<kind>A2</kind>
<date>20191018</date>
</item>
I'm using the xml2js library and I'm having trouble getting the entire document. My code looks something like this
parseString(xml, function (err, result) {
console.log(obj);
})
The XML only outputs only the first piece of xml. How can I parse this so I can get an array of <item>s?
My first idea is to loop through the doc as a string and split it based on <?xml version="1.0" encoding="UTF-8"?> and parse the data that way.
Thanks!
I do not think you can have more than one xml declarations for a single xml document. Additionally, a root element must always be present.
Therefore, the xml document you have provided is 2 separate xml documents, in principle. Most parsers or APIs would probably reject it, as not well formed.
Do you have any control over how the document is generated? If yes, you should ensure that a single xml declaration and a single root element will be present. Something similar to:
<?xml version=“1.0” encoding=“utf-8”>
<items>
<item>…</item>
<item>…</item>
</items>
If you do not have any control on the generation, you should probably split it and parse the documents separately, or concatenate them and generate a document similar to the one above.

Is it okay to have a <!-- comment --> as first line in SVG?

Can you have an XML comment as the first line in an SVG file? For example:
<!-- Timestamp 1434061994 -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg...
Is this against SVG spec? Or would it ever fail validation or would this cause any sort of problems I'm not seeing when implementing this in a website?
Yes, it is ok to have a comment be the first line in an SVG file, but only if there is no XML declaration (<?xml version="1.0" encoding="UTF-8"?>).
Nothing can appear before the XML declaration in an XML file. At most one XML declaration can appear in a file, and if an XML declaration is used, then it must be at the very top of the file. Anything before an XML declaration, including a comment, prevents the XML from being well-formed and should result in an error such as the following diagnostic by Xerces-J:
The processing instruction target matching "[xX][mM][lL]" is not
allowed.
If a comment appears before the XML declaration, then the XML is not well-formed, and if the XML is not well-formed, the SVG is not conforming.
Final note: An XML declaration is optional. Unless you want to specify a version other than 1.0 or an encoding other than UTF-8, you don't have to have an XML declaration in an XML (or SVG) file.

How to replace with   in an html file

I want to replace all the with   in my html file to support XML parser.
But I don't want to replace them directly, I'd like to add an entity in <!DOCTYPE > like below:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"[<!ENTITY nbsp " ">]>
<html><head></head><body><div>Hello World!</div></body></html>
But when I view the file, there is an extra ]> on the top of the document:
Anyone know how to deal with it?
Thanks!
What you have is a valid way to include an entity declaration in an internal subset. The document is not otherwise valid, though, as you can check with the W3C Markup Validator: the required xmlns attribute on the html element is missing, and so is the required title attribute.
When served as text/html, the document is processed how browsers use to process HTML document, which means among other thing that internal subsets are not recognized; in fact, document type definitions are not read at all – instead, doctype declarations are just taken as magic strings so that some strings trigger “quirks mode”, some don’t. The doctype declaration is parsed in a simplistic manner, which makes the first “>” terminate it, so whatever comes after it is taken as character data.
The morale is that entity declarations just don’t work with “HTML”, internally or externally, when “HTML” means sending something to a browser and telling (in HTTP headers) it to be text/html – and that’s what servers normally tell when they send .html files.
Served as application/xhtml+xml and fixed to conform to XHTML syntax, your approach works on conforming browsers (online demo: http://www.cs.tut.fi/~jkorpela/test/nbsp.xhtml):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[<!ENTITY nbsp " ">]>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Entity demo</title></head>
<body>
<div>Hello World!</div>
</body>
</html>
However, IE 8 and earlier don’t process HTML when served as application/xhtml+xml (the browser just launches a “Save As” dialog).
The conclusions depend on what you are doing and why (and in which sense) you need to “support XML parser”. It’s not really about parsing but about entity declarations. XHTML user agents are not required to understand predefined entities as in HTML (except for those defined in XML), but has this possibility realized somehow? And in general, it is better to convert to actual no-break space characters than to character references.
here
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"[<!ENTITY nbsp " ">

Remove warning in MyEclipse

How can I modify the conditions for which MyEclipse will throw up warning flags? I'd be happy to hear a generic solution, but here is my specific problem for the curious/if it turns out to be relevant:
<html xmlns="http://www.w3.org/1999/xhtml">
<wicket:panel>
<p>
<object type="text/html" width="750" height="360" wicket:id="htmlRendition"></object>
</wicket:panel>
</html>
causes warnings "Undefined attribute name (xmlns)," "Unknown tag (wicket:panel)" and "Undefined attribute name (wicket:id)." Oddly, there are no errors for most HTML files paired with Wicket Java files, only files with the format ClassName$InnerClassName.html.
I use the following in my HTML files for Wicket:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd">
I know some IDEs (IntelliJ) for example allow you to register a dtd to validate your xml files. This article looks to apply to XML documents, but perhaps HTML files work or can be configured to work similarly:
http://help.eclipse.org/ganymede/index.jsp?topic=/org.eclipse.wst.xmleditor.doc.user/topics/cxmlcat.html
In the project properties, you can turn off different types of validation. For example, you can say that you don't want DTD validation of XML files, or HTML validation, etc.
Myeclipse example is here.

Can i make an XSLT transformation directly in html page?

I know namespace are used to describe, like doctype, but is there a way or a trick to transform inner namespace html with an xsl using xsd ?
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:sample="sample-uri">
<head >
<title>Enter the title of your XHTML document here</title>
</head>
<body >
<p sample:node="retrieve-transformation">Enter the body text of your XHTML document here</p>
</html>
In other words i want to know if i can process xsl transformation to an xhtml page whithout using javascript.
In XHTML (i.e. application/xhtml+xml—not text/html!), you can trigger an XSLT program without JavaScript using the xml-stylesheet processing instruction.
Modern browsers support XSLT out of the box.
Take a look at eu.wowarmory.com - they use it extensively. If the server detects a user agent that does not support XSLT, it is rendered at the server side, and a quite verbose HTML is rendered there and sent to the browser.
This makes a nice abstraction if you plan to provide an XML webservice similar to the web site.
No, you cannot perform an XSL transformation without using some kind of scripting technology. I would suggest you do it serverside to save the client the trouble; and to avoid various issues if the transformation for some reason does not succeed on the client or runs slow.