Supposing I have an HTML document like what's below:
mypage.html
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<table>
<tbody>
...
<tr>
<td id="MY_ID">123</td>
How would I edit the element set to MY_ID? I've used the following command successfully when it was just the table in a document, but placing it in a larger document broke it:
xmlstarlet ed --update '//td[#id="MY_ID"]' --value '456' mypage.html
Your td element needs to be closed (</td>) for it to be valid XML.
You can try the following it here :
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<table>
<tbody>
...
<tr>
<td id="MY_ID">123</td>
<td> id="NOTMYID">127</td>
</tr>
</tbody>
</table>
</body>
</html>
Using your own expression:
//td[#id="MY_ID
Related
I did search about it, and found nothing, so I ask here.
Recently I worked on a grails project in which nested html templates formed html code for emailing like next example, in which each DOCTYPE+Style corresponds to different templates that are used depending on business rules:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
...(styles here)
</style>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<style type="text/css">
...(more styles here)
</style>
<tr>
<td>
<table>
<tbody>
<tr>
<td class="greeting">
Hi
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<style type="text/css">
...(more styles here)
</style>
</td>
</tr>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<style type="text/css">
...(more styles here)
</style>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<style type="text/css">
...(more styles here)
</style>
...(and a lot more of code like this)
</td>
</tr>
</tbody>
</table>
But the html 4.0.1 standard says that <style> elements should go in the <head> of the document, not to mention that DOCTYPE should be the 1st one for a valid html document.
Why does it work?
Why doesn't it need to respect the standard to work?
Why is there a standard if you can skip it this way?
Surely I'm missing something.
I learned today that <html> root tag is optional too, as many others, but found nothing about this matter.
Thanks.-
Why does it work?
HTML parsers perform a lot of error recovery to handle bad HTML.
Why doesn't it need to respect the standard to work?
Because it is generally considered better to show end users a "best effort" rendering of a webpage instead of an error message.
Why is there a standard if you can skip it this way?
Because pages that follow the standard can expect more predictable behaviour in how parsers handle them (since they won't be triggering error recovery).
I am creating a web site and need to copy certain HTML from one page to another (rather than re-write it. Take this table for example:
I copy (from Page1.aspx)...
<table class="InfoBox">
<tr>
<td class="InfoBoxTopHeader">Data</td>
</tr>
<tr>
<td class="InfoBoxContent"></td>
</tr>
</table>
Then I paste (into Page2.aspx) and get....
<table class="InfoBox">
<tr>
<td class="InfoBoxTopHeader">Data</td>
</tr>
<tr>
<td class="auto-style1"></td>
</tr>
</table>
It takes the CSS from my CSS file, inserts it in the top of the APSX file and names it auto-style1 in this case only for InfoBoxContent but it could be all sometimes. It is driving me nuts.
I am using Visual Studio 2012. Any ideas why this is happening?
Does page 2 have a header with a link tag pointing to that stylesheet? If css isn't being expressed it may because that specific page doesn't see it. If that's the case, you may need to added a tag pointing to that stylesheet.
Something like:
<LINK href="special.css" rel="stylesheet" type="text/css">
Or more like:
<!DOCTYPE>
<HTML>
<HEAD>
<LINK href="special.css" rel="stylesheet" type="text/css">
</HEAD>
<BODY>
</BODY>
</HTML>
I'm trying to convert HTML to PDF using htmldoc, but even basic HTML does not convert properly, I have this HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>pdf test</title>
</head>
<body>
<table border="1">
<tr>
<td width="50%">
a
</td>
<td>
<p>
some address
</p>
<p>
some other text
</p>
</td>
</tr>
<tr>
<td>
test<br>
test2<br>
asdfasdf<br>
qwerqwer<br>
fasdfasdf
</td>
<td>
bla
</td>
</tr>
</table>
</body>
</html>
but it renders like this: test.pdf using this command:
htmldoc --webpage --color --charset utf-8 -t pdf14 --size a4 test.html -f test.pdf
it's HTMLDOC Version 1.9svn, I tried to change charset, add thead, tbody etc and nothing helped .. do you know what can be the problem ?
also it doesn't accept style="padding: 10px" in that paragraphs etc
The command:
htmldoc --size universal --webpage -t pdf --firstpage p1 -f test.pdf test.html
renders the page well for me. It is unclear from the original question whether the options for utf-8 color and pdf type you entered are actually needed for your result or are actually the cause of the incorrect rendering.
I was asked by the professor to use http://validator.w3.org/ this web site to validate my html file. It gives me errors.
The first error is:
Line 1, Column 1: no document type
declaration; implying ""
Second error is:
Line 11, Column 64: required attribute
"ALT" not specified ✉ The
attribute given above is required for
an element that you've used, but you
have omitted it. For instance, in most
HTML and XHTML document types the
"type" attribute is required on the
"script" element and the "alt"
attribute is required for the "img"
element.
Typical values for type are type="text/css" for and type="text/javascript" for .
Can anyone tell what is wrong? It displays just fine in my browser I am using IE 8. But the professor says if it fails in this validation check then the assignment is incomplete. Any help would be great.
<html>
<head>
<title>Randy's first html web page !</title>
</head>
<body bgcolor="#000066" text="#00ff44">
<h1 align="center"> Hello Professor</h1>
<h2 align="center"> By: Randy White</h2>
<p> I haven't done anything like this before.</p>
<p> Seems to be ok</p>
<p align="center"><img src="Koala.gif" width="100" height="100">
<table border="1">
<tr>
<th>Month</th>
<th>Day</th>
<th>Year</th>
</tr>
<tr>
<td>December</td>
<td>1</td>
<td>2010</td>
</tr>
</table>
</body>
</html>
You need to add a doctype to top of HTML page, so that the browser understands what kind of document it is.
<!DOCTYPE html>
This should make a few of other errors disappear. The remnant is pretty self-explaining. Add the required attributes and so on.
Inside of your img tag you need an alt tag...
<img src='koala.jpg' alt='A Koala!'>
An example of a document tag is this(taken from this very page)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
The w3c has more info.
For a valid HTML file you need a tag before your HTML, it needs to be the first line. See here for more information http://www.w3.org/QA/2002/04/valid-dtd-list.html
Second error: You need an alt="..." attribute in your img tag, this should be information pertaining to the image, in the case that the image is unavailable, or can't be seen by the user.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Randy's first html web page !</title>
</head>
<body bgcolor="#000066" text="#00ff44">
<h1 align="center"> Hello Professor</h1>
<h2 align="center"> By: Randy White</h2>
<p> I haven't done anything like this before.</p>
<p> Seems to be ok</p>
<p align="center"><img src="Koala.gif" width="100" height="100" alt="koala image">
<table border="1">
<tr>
<th>Month</th>
<th>Day</th>
<th>Year</th>
</tr>
<tr>
<td>December</td>
<td>1</td>
<td>2010</td>
</tr>
</table>
</body>
</html>
i won't answer your questions but assuming you have to provide it tomorrow here you go.
Add a doctype, and pass it through the w3c validator for the rest: validator.w3.org/
The alt error is for your image. alt attributes are required on images.
HTML is validated accordinng to XML (or SGML) rules. Browsers have been developed to accept non-conformant HTML, and so browsers are not a test of validity. Similarly there is good practice such as adding ALT attributes (for accessibility). Sighted humans don't need them but many others do.
Therefore it is always good practice to create conformant documents.
We came across a strange behavior while using html tag. Actually the issue was because of improper usage of the tag. As a result of that, the page was getting submitted 3 times in Mozilla Firefox and 2 times in IE7. Here is the issue.
<link rel="stylesheet" type="text/css" title="Style" href=''/css/image.css'>
This was the code that we were using in one of our J2EE application. When we inspected the request and response (using HTTP Watch), we found out that the page was requested 3 times from the server. We found out that the additional "quote" after href= is causing the problem. We were not able to find out the reason why this was causing multiple page submissions. Is it because, the additional quote makes the href empty and because of that browser tries to load the styleclass from the same URL that loaded the page?
Can someone please help to find out the reason why this is happening? Any help will be greatly appreciated.
this explanation came from this article on Yahoo Developer site. Section Avoid Empty Image src:
When an empty string is encountered as a URI, it is considered a relative URI and is resolved according to the algorithm defined in section 5.2. This specific example, an empty string, is listed in section 5.4.
Although it still not clear whether this behavior affect href or not (the article mainly concerned on empty src). But looks like it does:
Hopefully, browsers will not have this problem in the future. Unfortunately, there is no such clause for <script src=""> and <link href="">. Maybe there is still time to make that adjustment to ensure browsers don't accidentally implement this behavior.
note: I never encounter such behavior so it's just theoretical answer :)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
</head>
<body>
<table width="100%" cellspacing="0" cellpadding="0" border="0" class="mainOutterTable">
<tbody>
<tr>
<td>
<table class="layoutColumn" cellpadding="0" cellspacing="0">
<tr>
<td style="width:100%;" valign="top">
<table class="layoutRow" cellpadding="0" cellspacing="0">
<tr>
<td valign="top" width="910">
<table class="layoutColumn" cellpadding="0" cellspacing="0">
<tr>
<td style="width:100%;" valign="top">
<table class="layoutRow" cellpadding="0" cellspacing="0">
<tr>
<td valign="top" width="294">
<table class="layoutColumn" cellpadding="0" cellspacing="0">
<tr>
<td style="width:100%;" valign="top">
<a name="7_N1K8HIC0GOO780I2B1KASD3047"></a>
<div class="wpsPortletBody">
This is a sample textf 3.
<link rel="stylesheet" type="text/css" title="Style" href=''/>
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
I was able to reproduce the problem on a test page.
This is mentioned in Section 5.2 of http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.2
If the Relative path is empty, then the target URI is the base path.
if (R.path == "") then
T.path = Base.path;