Currently, I am working on a program that converts an html page into a PDF using the iText Library.
The Checker that I am using is PAC3 -->PDF Accessibility Checker 3 which is described by the following link (https://section508.gov/blog/check-pdf).
One of the issues is the “Alternate description missing for an Annotation”
An excerpt from the following link explains it:
http://www.uottawa.ca/respect/sites/www.uottawa.ca.respect/files/fss-fixing-accessibility-errors-in-pdfs.pdf
Alternative description missing for an annotation This usually happens when the link is not clear enough. To fix this error, add alternative text to the link tags. To add the alternative text, please do the following;
In the tag tree, select the tag for the link, and select Properties
from the options menu.
In the Touchup Properties dialog box, select
the Tag Tab.
Type alternate text for the link, and click close
I have been trying to use iText to fix this problem, but googling, looking at the source and reading the documentation does not help.
Does anybody have any suggestions on how to either write the HTML or use the itext problem to get rid of the “Alternate description missing for an Annotation”
Thank you for your help
You did not specify whether you using old code (XMLWorker, HTMLWorker) or new iText code (pdfHTML).
This of course impacts the proposed solution.
In my answer I am going to assume you are using pdfHTML
There are several options:
edit the incoming HTML using a library like JSoup
convert the incoming HTML to iText IElement objects, and edit those, setting properties where needed
write your own custom TagWorker that handles all instances of a specific tag, and write custom logic to deal with the missing annotations.
An example of a custom tag worker can be found here:
https://developers.itextpdf.com/content/itext-7-examples/converting-html-pdf/pdfhtml-custom-tagworker-example
I have a problem with a project that I have been working on for a while now. When I open an entry, the inputs all have data in them. However when I go to actually print the entry (using the print-mode, CTRL+P), I notice that the select, and textarea fields are missing the selected or input text on paper. What's going on here? I have googled for hours and can't come up with anything relevant... help! Thanks!
Link to Project (Click "Demo" to create a session - required)
Link to Entry (Click this after you click the 1st link)
Update : Before now I have just been using Google Chrome to test printing. However, when I tried FireFox, it printed alright, is there anything special I need to do to get Chrome to work??...
RESOLUTION : I am using jQuery and I found that by removing the ui-corner-all class from the inputs before printing, I could then print. Thanks anyways!
If you're using Internet Explorer, then this discussion thread may help you (summary: IE can get confused when printing malformed docs). Check your doctype tag matches your output (the w3c markup validator is very useful)
Edit - I see you're using chrome, but I suggest you still check your page is well formed :)
RESOLUTION: I'm using jQuery and I found that by removing the ui-corner-all class from the inputs before printing, I could then print. Thanks anyways!
This is something of a follow-up question to my question here. You can find the HTML source in a text file here.
When I load that page in IE8, I get the "Done, but with errors on page." message in my status bar. The detail view shows
Expected identifier
sms Line: 147
Code: 0 Char: 67
and I see absolutely no problems anywhere near there. In IE8, the page is still behaving erratically w/r/t the randomly losing focus as mentioned in my other question.
When I load the same exact page in Firefox (using Firebug) the console shows no errors and the page works perfectly. Any thoughts on what's going on here? This is driving me nuts and making me want to give up on even trying to write an IE friendly page.
Edit: Thanks for all the comments! This page is written as a JSP, so I edit in Eclipse. I found an Eclipse warning about the onblur event for the username field. I switched it from
onblur="alert(document.activeElement + ' class:' + document.activeElement.class)"
to
onblur="alert(document.activeElement)"
and that made the bizarre IE page error vanish. I had been trying to give more info (namely, its CSS class) about specifically which element is stealing focus - to my own detriment, apparently, since Javascript was interpreting the '.class' part in the Java(script) sense there is no class property (I should have been using className).
And, no, the page doesn't validate. But the errors were mostly/all ones that just didn't make sense to me, such as
Line 14, Column 41: Attribute "LANGUAGE" is not a valid attribute. Did you mean "language"?
But I'm still stuck trying to figure out why, as I enter text in the username & password fields, focus randomly switches to a div (working on figuring out which div currently).
Edit 2: It's the div between the two "global nav" comments, at the very top of the body. Still no idea why it's happening, though.
The problem appears to be the line
onBlur="alert(document.activeElement + ' class:' + document.activeElement.class)"
when you take off the ".class" it loads without issue. Are you sure ".class" is valid?
Does your HTML validate?
I agree, IE does a terrible job giving developers information about page errors. I only support IE on the principle that users shouldn't have to download twenty-odd browsers to go to their favorite websites. Web developers have a responsibility to make it "just work". Browser developers have a responsibility to communicate with each other and conform to standards.
Run your javascript through JSLint. You probably have a missing semicolon somewhere or a comma at the end of an object property that shouldn't be there. Firefox is more forgiving than IE when it comes to those types of syntax errors.
Edit: The inline js in your page seems to be OK. Check the contents of qm_scripts.js.
When I remove the .class from document.activeElement.class the error goes away.
The real issue is that you need to be able to debug your JS in IE 8, correct? There is a tool for that! :)
I'm looking for a tool that will give me the proper generated source including DOM changes made by AJAX requests for input into W3's validator. I've tried the following methods:
Web Developer Toolbar - Generates invalid source according to the doc-type (e.g. it removes the self closing portion of tags). Loses the doctype portion of the page.
Firebug - Fixes potential flaws in the source (e.g. unclosed tags). Also loses doctype portion of tags and injects the console which itself is invalid HTML.
IE Developer Toolbar - Generates invalid source according to the doc-type (e.g. it makes all tags uppercase, against XHTML spec).
Highlight + View Selection Source - Frequently difficult to get the entire page, also excludes doc-type.
Is there any program or add-on out there that will give me the exact current version of the source, without fixing or changing it in some way? So far, Firebug seems the best, but I worry it may fix some of my mistakes.
Solution
It turns out there is no exact solution to what I wanted as Justin explained. The best solution seems to be to validate the source inside of Firebug's console, even though it will contain some errors caused by Firebug. I'd also like to thank Forgotten Semicolon for explaining why "View Generated Source" doesn't match the actual source. If I could mark 2 best answers, I would.
Justin is dead on. The key point here is that HTML is just a language for describing a document. Once the browser reads it, it's gone. Open tags, close tags, and formatting are all taken care of by the parser and then go away. Any tool that shows you HTML is generating it based on the contents of the document, so it will always be valid.
I had to explain this to another web developer once, and it took a little while for him to accept it.
You can try it for yourself in any JavaScript console:
el = document.createElement('div');
el.innerHTML = "<p>Some text<P>More text";
el.innerHTML; // <p>Some text</p><p>More text</p>
The un-closed tags and uppercase tag names are gone, because that HTML was parsed and discarded after the second line.
The right way to modify the document from JavaScript is with document methods (createElement, appendChild, setAttribute, etc.) and you'll observe that there's no reference to tags or HTML syntax in any of those functions. If you're using document.write, innerHTML, or other HTML-speaking calls to modify your pages, the only way to validate it is to catch what you're putting into them and validate that HTML separately.
That said, the simplest way to get at the HTML representation of the document is this:
document.documentElement.innerHTML
[updating in response to more details in the edited question]
The problem you're running into is that, once a page is modified by ajax requests, the current HTML exists only inside the browser's DOM-- there's no longer any independent source HTML that you can validate other than what you can pull out of the DOM.
As you've observed, IE's DOM stores tags in upper case, fixes up unclosed tags, and makes lots of other alterations to the HTML it got originally. This is because browsers are generally very good at taking HTML with problems (e.g. unclosed tags) and fixing up those problems to display something useful to the user. Once the HTML has been canonicalized by IE, the original source HTML is essentially lost from the DOM's perspective, as far as I know.
Firefox most likley makes fewer of these changes, so Firebug is probably your better bet.
A final (and more labor-intensive) option may work for pages with simple ajax alterations, e.g. fetching some HTML from the server and importing this into the page inside a particular element. In that case, you can use fiddler or similar tool to manually stitch together the original HTML with the Ajax HTML. This is probably more trouble than it's worth, and is error prone, but it's one more possibility.
[Original response here to the original question]
Fiddler (http://www.fiddlertool.com/) is a free, browser-independent tool which works very well to fetch the exact HTML received by a browser. It shows you exact bytes on the wire as well as decoded/unzipped/etc content which you can feed into any HTML analysis tool. It also shows headers, timings, HTTP status, and lots of other good stuff.
You can also use fiddler to copy and rebuild requests if you want to test how a server responds to slightly different headers.
Fiddler works as a proxy server, sitting in between your browser and the website, and logs traffic going both ways.
I know this is an old post, but I just found this piece of gold. This is old (2006), but still works with IE9. I personnally added a bookmark with this.
Just copy paste this in your browser's address bar:
javascript:void(window.open("javascript:document.open(\"text/plain\");document.write(opener.document.body.parentNode.outerHTML)"))
As for firefox, web developper tool bar does the job. I usually use this, but sometimes, some dirty 3rd party asp.net controls generates differents markups based on the user agent...
EDIT
As Bryan pointed in the comment, some browser remove the javascript: part when copy/pasting in url bar. I just tested and that's the case with IE10.
If you load the document in Chrome, the Developer|Elements view will show you the HTML as fiddled by your JS code. It's not directly HTML text and you have to open (unfold) any elements of interest, but you effectively get to inspect the generated HTML.
In the Web Developer Toolbar, have you tried the Tools -> Validate HTML or Tools -> Validate Local HTML options?
The Validate HTML option sends the url to the validator, which works well with publicly facing sites. The Validate Local HTML option sends the current page's HTML to the validator, which works well with pages behind a login, or those that aren't publicly accessible.
You may also want to try View Source Chart (also as FireFox add-on). An interesting note there:
Q. Why does View Source Chart change my XHTML tags to HTML tags?
A. It doesn't. The browser is making these changes, VSC merely displays what the browser has done with your code. Most common: self closing tags lose their closing slash (/). See this article on Rendered Source for more information (archive.org).
Using the Firefox Web Developer Toolbar (https://addons.mozilla.org/en-US/firefox/addon/60)
Just go to View Source -> View Generated Source
I use it all the time for the exact same thing.
I had the same problem, and I've found here a solution:
http://ubuntuincident.wordpress.com/2011/04/15/scraping-ajax-web-pages/
So, to use Crowbar, the tool from here:
http://simile.mit.edu/wiki/Crowbar (now (2015-12) 404s)
wayback machine link:
http://web.archive.org/web/20140421160451/http://simile.mit.edu/wiki/Crowbar
It gave me the faulty, invalid HTML.
This is an old question, and here's an old answer that has once worked flawlessly for me for many years, but doesn't any more, at least not as of January 2016:
The "Generated Source" bookmarklet from SquareFree does exactly what you want -- and, unlike the otherwise fine "old gold" from #Johnny5, displays as source code (rather than being rendered normally by the browser, at least in the case of Google Chrome on Mac):
https://www.squarefree.com/bookmarklets/webdevel.html#generated_source
Unfortunately, it behaves just like the "old gold" from #Johnny5: it does not show up as source code any more. Sorry.
In Firefox, just ctrl-a (select everything on the screen) then right click "View Selection Source". This captures any changes made by JavaScript to the DOM.
alert(document.documentElement.outerHTML);
Check out "View Rendered Source" chrome extension:
https://chrome.google.com/webstore/detail/view-rendered-source/ejgngohbdedoabanmclafpkoogegdpob/
Why not type this is the urlbar?
javascript:alert(document.body.innerHTML)
In the elements tab, right click the html node > copy > copy element - then paste into an editor.
As has been mentioned above, once the source has been converted into a DOM tree, the original source no longer exists in the browser. Any changes you make will be to the DOM, not the source.
However, you can parse the modified DOM back into HTML, letting you see the "generated source".
In Chrome, open the developer tools and click the elements tab.
Right click the HTML element.
Choose copy > copy element.
Paste into an editor.
You can now see the current DOM as an HTML page.
This is not the full DOM
Note that the DOM cannot be fully represented by an HTML document. This is because the DOM has many more properties than the HTML has attributes. However this will do a reasonable job.
I think IE dev tools (F12) has; View > Source > DOM (Page)
You would need to copy and paste the DOM and save it to send to the validator.
Only thing i found is the BetterSource extension for Safari this will show you the manipulated source of the document only downside is nothing remotely like it for Firefox
The below javascript code snippet will get you the complete ajax rendered HTML generated source. Browser independent one. Enjoy :)
function outerHTML(node){
// if IE, Chrome take the internal method otherwise build one as lower versions of firefox
//does not support element.outerHTML property
return node.outerHTML || (
function(n){
var div = document.createElement('div'), h;
div.appendChild( n.cloneNode(true) );
h = div.innerHTML;
div = null;
return h;
})(node);
}
var outerhtml = outerHTML(document.getElementsByTagName('html')[0]);
var node = document.doctype;
var doctypestring="";
if(node)
{
// IE8 and below does not have document.doctype and you will get null if you access it.
doctypestring = "<!DOCTYPE "
+ node.name
+ (node.publicId ? ' PUBLIC "' + node.publicId + '"' : '')
+ (!node.publicId && node.systemId ? ' SYSTEM' : '')
+ (node.systemId ? ' "' + node.systemId + '"' : '')
+ '>';
}
else
{
// for IE8 and below you can access doctype like this
doctypestring = document.all[0].text;
}
doctypestring +outerhtml ;
I was able to solve a similar issue by logging the results of the ajax call to the console. This was the html returned and I could easily see any issues that it had.
in my .done() function of my ajax call I added console.log(results) so I could see the html in the debugger console.
function GetReversals() {
$("#getReversalsLoadingButton").removeClass("d-none");
$("#getReversalsButton").addClass("d-none");
$.ajax({
url: '/Home/LookupReversals',
data: $("#LookupReversals").serialize(),
type: 'Post',
cache: false
}).done(function (result) {
$('#reversalResults').html(result);
console.log(result);
}).fail(function (jqXHR, textStatus, errorThrown) {
//alert("There was a problem getting results. Please try again. " + jqXHR.responseText + " | " + jqXHR.statusText);
$("#reversalResults").html("<div class='text-danger'>" + jqXHR.responseText + "</div>");
}).always(function () {
$("#getReversalsLoadingButton").addClass("d-none");
$("#getReversalsButton").removeClass("d-none");
});
}