libxml2 fails to handle CDATA in HTML correctly - html

I'm using libxml2.2.7.3 to parse html pages and I'm having difficulties getting it work correctly with CDATA in HTML. Here's the code:
xmlDocPtr doc = htmlReadMemory(data, length, "", NULL, 0);
xmlBufferPtr buffer = xmlBufferCreate();
xmlNodeDump(buffer, doc, doc->children, 0, 0);
printf("%s", (char*)buffer->content);
and the HTML data:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<div>
<script type="text/javascript">
//<![CDATA[
document.write('</div>');
//]]>
</script>
</div>
</body></html>
The parser erroneously recognizes the </div> inside the quotes as a real html tag and prints out error messages as follows:
:8: HTML parser error : Unexpected end tag : script
</script>
^
:9: HTML parser error : Unexpected end tag : div
</div>
^
And the result printed out and debugging also imply that parsing went wrong:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<div>
<script type="text/javascript"><![CDATA[
//<![CDATA[
document.write(']]></script></div>');
//]]>
</body></html>
So the question is, is this a bug of libxml2? Or am I doing something wrong?
Any insightful advices would be greatly appreciated.
Thanks!

In HTML, the <script> element contains CDATA by definition, so <![CDATA[ has no effect.
In short, the source document is broken.
That section would be more properly written as:
<script type="text/javascript">
document.write('<\/div>');
</script>

Related

Why is not possible to update dynamically the HTML DOM structure when `<!DOCTYPE html>` is specified?

I have this HTML code:
<!DOCTYPE html>
<html>
<head>
<title>Clear a Timer</title>
<meta charset="utf-8" />
<script>
var theTimer, xPosition = 0, theImage;
function doTimer() {
theImage = document.getElementById("courseraLogo");
xPosition = xPosition + 1;
theImage.style.left = xPosition;
}
</script>
</head>
<body onload="theTimer = setInterval(doTimer, 50)">
<img src="../img/coursera.png" id="courseraLogo"
style="position:absolute; left:0">
<button onclick="clearTimeout(theTimer);">
Stop!
</button>
</body>
</html>
The code is supposed to move an image from left to right at an interval of 50ms. It does not work if a specify the DOCTYPE tag: the image does not move. Why this is happening? Is there any compatibility issue related to the HTML version? Or do I need to use a similar method to setInterval compatible with HTML5?
You need to include the units when setting style.left:
theImage.style.left = xPosition + "px";
This isn't strictly speaking an HTML5 thing. Omitting the units works only if you include no doctype at all: including an HTML4 doctype such as <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> will also cause your script to fail unless you include the units.
(I tested this in current versions of Safari, Chrome, and Firefox on OS X; all three behaved identically.)

sublime text 2 , emmet and html snippet not expanding

I have read two answers on here for a similar issue, one mentions an example with colon, another mentions an example where there is a popup. I am new to emmet. This one is just html<tab>
When emmet is disabled, I can see it expands html<tab> based on the file html.sublime-snippet so html<tab> becomes
<<html>
<head>
<title></title>
</head>
<body>
</body>
</html>
But with emmet enabled, when I do html it just expands to <html></html> like any other tag.
I thought ok so it ignores html.sublime-snippet and it uses its emmet snippets file of snippets.json
So I added an html line to the relevant looking part of the snippets.json file
I added this line to the end
"html": "<html>\n<head>\n<title></title>\n</head>\n<body>\n</body>\n</html>"
"html": {
"filters": "html",
"profile": "html",
"snippets": {
"!!!": "<!DOCTYPE html>",
"!!!4t": "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">",
"!!!4s": "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">",
"!!!xt": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">",
"!!!xs": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">",
"!!!xxs": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">",
"c": "<!-- |${child} -->",
"cc:ie6": "<!--[if lte IE 6]>\n\t${child}|\n<![endif]-->",
"cc:ie": "<!--[if IE]>\n\t${child}|\n<![endif]-->",
"cc:noie": "<!--[if !IE]><!-->\n\t${child}|\n<!--<![endif]-->",
"html": "<html>\n<head>\n<title></title>\n</head>\n<body>\n</body>\n</html>"
},
But still when I type html<tab> it expands to <html></html> rather than the whole thing with the nested tags in between that it expands to when emmet is disabled .
Similarly with script With emmet disabled it expands to <script type="text/javascript"></script> Whereas with emmet enabled it only expands to <script></script>

How to render fusionchart graph in jsp using setXMLData?

I'm currently working on FusionChartsFree on a small internal application and I have a small html code like this.
<html>
<head>
<title>My First chart using FusionCharts XT</title>
<script type="text/javascript" src="FusionCharts.js"></script>
</head>
<body>
<div id="chartContainer">FusionCharts XT will load here!</div>
<script type="text/javascript">
var myChart = new FusionCharts( "Line.swf", "myChartId", "400", "300");
var strXML = "<chart caption='Critical' xAxisName='month' yAxisName='Count' yAxisMinValue ='40' showValues= '0'><set label = 'month1' value='55'/><set label = 'month2' value='55'/><set label = 'month3' value='55'/><set label = 'month4' value='55'/></chart>" ;
myChart.setXMLData(strXML);
myChart.render("chartContainer");
</script>
</body>
</html>
The above code works perfectly and renders a graph. Now, I'm trying to do the same thing using JSP as below :
<%# page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Graphs</title>
<script type="text/javascript" src="FusionCharts.js"></script>
</head>
<body>
<%
String data="<chart caption='Minor' xAxisName='month' yAxisName='Count' yAxisMinValue ='66500' showValues= '0'>"+"\n"+"<set label = 'month1' value='66560'/>"+"\n"+"<set label = 'month2' value='66560'/>"+"\n"+"<set label = 'month3' value='66647'/>"+"\n"+"<set label = 'month4' value='66631'/>"+"\n"+"</chart>";
System.out.println(data);
%>
<div id="chartContainer1" align="left" style="margin-top: 22px; padding-top: 310px;">blocker data</div>
<script>
var blocker = new FusionCharts("Line.swf", "myChartId1", "400", "300");
var strXML1="<%=data%>";
blocker.setXMLData(strXML1);
blocker.render("chartContainer1");
</script>
</body>
</html>
The problem comes when I'm generating the "data" String dynamically, I do not get any output. Please help
The problem is the extra "\n" in your XML of JSP page. Remove "\n" and check again, it will work fine.
When you are passing data using XMLData() function, FusionCharts expects a String of XML data without any line breaks(that are explicitly included in the XML).

Find a C# HTML parser find all <script> and give me the line and position info

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>title</title>
</head>
<body>
I want to get this text
<script>
var test=function()
{}
</script>
</body>
</html>
and the result is:
line:7,
position :4
content:
var test=function()
{}
Have you tried the HTML Agility Pack?
This typically works quite well and gives you a nice intuitive interface into parsing HTML content.
You should be able to use it something like this:
HtmlDocument doc = new HtmlDocument();
doc.Load("yourfile.html");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//script)
{
// do something with your script nodes
}

HTML5 + Jscript with JQuery, setInterval problem

Please can someone tell me why this isn't working before I defenestrate everything on my desk.
I have the following html document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html"; charset=utf-8 />
<title>Sample Gauge using JQuery</title>
</head>
<body>
<canvas id="gauge" width="200" height="200">
Your web browser does not support HTML 5 because you fail.
</canvas>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="gauge.js"></script>
</body>
</html>
And gauge.js:
$(document).ready(function () {
//(canvas gets assigned)
startRendering();
function startRendering() {
setInterval("render();", 1000);
}
function render() {
renderBackground();
renderNeedle(-172);
}
//(Defined functions for rendering a gauge)
});
render() does not get called, but if I change the setInterval line to just 'render()' it does.
Also, if I change setInterval() to contain something like "alert('LOL')" then it does work, it just doesn't seem to work with functions I have defined.
With or without a semicolon at the end of the function(s) to call makes no difference, nor does prefixing this. to my functions.
I'm trying to get this working so I can start using it to animate the gauge. Can anyone see why it isn't working?
I hate web development.
Change
setInterval("render();", 1000);
To
setInterval(render, 1000);
When you pass a string to setInterval(), the code inside is executed outside of the current scope. It's much more appropriate to just pass the function anyway. Functions can be passed around just like any other variable if you leave the parenthesis off.
I'm not sure why using just "render()" doesn't work but using this code will fix the problem.
function startRendering() {
setInterval(function(){
render()
}, 1000);
}