writing HTML back to file - html

Header for my input file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
After doing certain changes in document using lxml.html parser, i need to save the changes in a file. while doing so Header of the file changes to:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<?xml version="1.0" encoding="UTF-8"??>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" xml:lang="en">
i am not sure why is this happening, may be its too silly but i am stuck here. please help!!

Maybe parser caters to the IE6's Quirks mode triggering?

Related

message element meta is not allowed here in PHPStorm

Can someone explain me why I'm not allowed to use the <meta>-tag?
I always use it to ensure utf-8 encoding.
My code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta charset="utf-8">
My IDE (PHPStorm) throws the following message:
Element meta is not allowed here
Put the meta tag inside the head, the only children of the HTML element should be head and body.
Try this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta charset="utf-8">
</head>
</html>

XSL to XHTML Strict with DOCTYPE - META tag validation issue

XSL Sample
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
>
<xsl:output method="html"/>
<xsl:template match="/Report">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
The problem: I need to define the DOC TYPE
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
in the XSL but it's not letting me put it in there, says invalid style sheet.
I tried:
<xsl:text disable-output-escaping='yes'><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"></xsl:text>
but is it the proper way to do this? Online Validators don't even see that line which when rendered by html is supposed to be valid.
What's the proper way of adding DOCTYPE?
should my xmlns:xsl="http://www.w3.org/1999/XSL/Transform" still be transform there or?
I have an XML data file with XSL style sheet that i transform into an html. I want to add the above to make it XHTML Strict Compliant. Any advice would be appreciated. Thank you!
Expected Output
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title> Strict DTD XHTML Example </title>
</head>
<body>
<p>
Please Choose a Day:
<br />
<br />
<select name="day">
<option selected="selected">Monday</option>
<option>Tuesday</option>
<option>Wednesday</option>
</select>
</p>
</body>
</html>
Generated XHTML
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>OWASP ZAP Vulnerability Report</title>
</head>
<body>
</body>
</html>
Issues Using: https://validator.w3.org/
The issue seems to be caused by the META tag in the head element but i don't understand why.
So how do i stop my XSL from adding the META tag?
SOLUTION
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="/Report">
<xsl:text disable-output-escaping='yes'><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
</xsl:text>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
Change the xsl:output method from html to xml. But back to the other question, is there a better way to declare DOCTYPE?
Use the attributes of xsl:output, https://www.w3.org/TR/xslt#output, i.e.doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd", then the XSLT processor on serializing adds the DOCTYPE.

Undefined attribute name (xmlns)

Here is the beginning of my HTML page :
1. <?xml version="1.0" encoding="utf-8" ?>
2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3. <html xmlns="http://www.w3.org/1999/xhtml" lang="fr">
4. (...)
5. </html>
Whether there is the <?xml ... ?> part or not, Eclipse returns me a warning event on the line 3
Undefined attribute name (xmlns).
This xmlns attribute is required for correct validation, and so I don't understand why Eclipse returns a warning.
Furthermore, I'm using Eclipse Indigo 3.7.2 with the last PHP Developer Tool from the Eclipse database.
Anybody knows how to remove this warning, or find a way to skip this ?
Thanks for reading and helping.
There is bug filed for similar case here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=313859
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr">
<head>
<title>Title of the document</title>
</head>
<body>
<div>The content of the document.</div>
</body>
</html>
Should validated. Validates fine in Eclipse Juno and here: http://validator.w3.org.

validate html, problem in my doctype and use of lan(g)

I can't get my website validated; it jumps an error on my doctype which is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//ES" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
(under Spanish)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
(under English)...
The html tag:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="EN" lan="en">
THE ERROR
Error: unrecognized DOCTYPE; unable to check document
any idea why?
do not specify the language in the doctype but always use:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" .......
Set the language in the <html> tag: <html lang="es">
http://www.w3.org/QA/2002/04/valid-dtd-list.html#DTD
http://www.alistapart.com/articles/doctype/
I use this one, which is the same, except for the html in lowercase. I'm not sure if that would matter.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Dit you check if there's any whitespace before the doctype?
The upper/lower case issue of "html" is described here: Recommended Doctype Declarations

W3C Validator - Document type does not allow element "body" here

I am trying to validate the following code with the W3C validator:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Test</title>
</head>
<body>
</body>
</html>
I get two errors:
Document type does not allow element "body" here
End tag for "html" which is not finished
Does anyone know how to fix this?
You're using the Frameset DTD, which doesn't allow body. It is meant for use with framesets, which are used to display frames. You can use Strict instead:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Test</title>
</head>
<body>
</body>
</html>