Image quality in "Web Page, Filtered" - html

This is an important issue and I would be grateful for a solution.
I convert a document (RTF) with several embedded images to HTML with Word 2010. The original quality of all images is 200 dpi. When saving as "Web Page, Filtered", all images are reduced to 96 dpi, despite of what I set in the "Save As" dialog -> Tools -> Web Options -> Pictures -> Pixels per inch. I would like to keep the original image quality.
Setting ActiveDocument.WebOptions.PixelsPerInch with VBA is also ignored. Note that other options, e.g., ActiveDocument.WebOptions.AllowPNG work fine.
When I save as "Web Page" (without Filtered; VML enabled) the images are saved in original quality. However, since the image names are numbered differently, I cannot simply copy the images to the filtered version. Matching the images by hand is not an option.
Any help would be appreciated!
Please do not suggest other Word/RTF to HTML converters - I have tried several and the results were not satisfactory for my document.
Note that I have also posted this in Microsoft Answers, but did not get an answer yet.
There is a similar but unanswered topic on stackoverflow.

I met this or similar problem too when trying to save to HTML a textbook manuscript with elaborate graphical figures: they would get dithered badly, messing up all fine line art and type. Completely by chance I found a solution that worked for me: Save it as IE 6.0 HTML rather than the default IE 4.0 HTML. (That can be adjusted e.g. via the Tools menu of the Save As dialog: Save As... -> Tools (in the top right corner) -> Web Options.) This way the images are saved as PNG not GIF and, quite magically, that also makes the Word image renderer do a much better job for the same DPI!

Sorry to be the bearer of bad news - this is by-default for Filtered webpage saves in Word. All mso tags are removed and it's made as small as possible by lowering the DPI to 96.

Tools->Options->General->Web Options->Pictures

Related

Semantic Screenshots for Web Browsers

An awful lot of modern web traffic (particularly on social media) consists of screenshots from web browsers. These typically include some formatted text, some layout, and some bitmap/vector graphics. E.g.,
It's really easy to take and share a screenshot, but it throws away lots of useful information and doesn't transfer well between devices (not to mention being far less amenable to things like screen readers for the blind and fancy data-mining). Of course the ironic part of this is that HTML/SVG is the perfect format for representing such data, and we're not using it even though it's right there.
html2canvas comes close to doing this, but doesn't properly handle images, see some semi-related discussion here.
My question is this, how can I select a visible area in my browser and save it in a format (ideally HTML) that preserves text and images and renders to something roughly similar when rendered separately? (so that it could be included as e.g. a data iframe for sharing).
I know that this is in general impossible, and that rendering HTML is a complicated task, but I feel like it should be possible to ask the browser something like "what elements are being rendered within these pixel coordinates?".
First:
Right click on page, then click on "Save page as".
Save it with a name that ends with .html (or .webarchive in some scenarios. See which works best for you).
Edit the now saved html file to only have the part you want (you can use any text editor. Sublime Text and Atom are usually suggested).
Then:
You can open it in your browser to see what you are up to.
You might want to inspect where the CSS is from too, and get that in your html's file folder, then link the html file to it, so as to preserve the styles.
As far as I understand, you'd want to bring all the CSS to be inline, or, at least, in the <head> section of the html file, so you can upload it as a single file, and don't need to keep linking it to the CSS file.

Good PDF to HTML Converter for Mobiles

We are having Multiple PDF which have account tables and balance sheet within it. We have tried many Converters but the result is not satisfactory. Can anybody please suggest any good converter that would replicated the contents of PDF to Exact structure in HTML. IF any paid Converter is there please suggest me .
This is the PDF we want to convert and Show in html "http://www.marico.com/html/investor/pdf/Quarterly_Updates/Consolidated%20Financial%20Results%20-%20Q3FY11.pdf"
Have you looked into this? http://pdftohtml.sourceforge.net/
It's open source as well, so it's free and can be modified if necessary.
There's even a demo showing the before PDF and the after HTML version. Not bad if you ask me.
If you're having issues specifically with tables in PDFs, perhaps the issue are the table themselves and whatever program is being used to generate them. Not all PDFs are created equal.
ALSO: Be aware that all PDFs that I've created and come across over the years have had lots of issues when it comes to copy/pasting blocks/lines of text that have other blocks/lines of text at equal or higher height on any given page. I think Acrobat lacks the ability to define a "sequence order" of what block is selected after what (or most programs don't use it properly), so the system sorta moves from a top-down, left-to-right way of selecting content.....even if that means jumping over large blank areas or grabbing lines from multiple columns at once when you wouldn't expect it. This may be part of your tabular data issue. Your weak link here is the PDF format itself and I think perhaps you may be expecting too much from it. Turning anything into a PDF is pretty much a one-way street, especially when you start putting lots of editable text into it.
Have you tried http://www.jpedal.org/html_index.php - there is also a free online version

Suggestions for making pixel-perfect CSS layouts?

One business goal requires that I make a form on screen that's pixel-perfect. If a user prints this form, it will exactly match the US Government Printing Office version of the form; the printer will produce a (reasonably) scannable copy of this document. The previous solution is PDF, which will only work to a certain point for us.
I'm leaning towards HTML/CSS, and would like suggestions on tools to assist that.
For tools, PixelPerfect in Firefox seems a good start. The target platform for this is (drum roll) IE6, if it helps. The document looks like this.
If HTML/CSS is a complete no-go, Adobe Flex is my next choice.
If pixel perfect printing is the goal, and not even PDF will get you there, you can pretty much give up straight away on printing from the browser. There are waaaay too many variables when rendering on the client side: from different browsers (IE6? Good luck!) to different fonts, to user settings, to A4 vs Letter size paper.
Could I ask why PDF doesn't suit?
I agree that pixel-perfect layouts are very, very hard to achieve with html/css, particularly with forms. However, I think pdfs can recieve input from external web forms, or have textfields that when filled out will print.
Flex outputting to pdf would be a good idea, but I don't think using flex as an rendering engine will help too much with this.
Another option would be to make the pdf and use a server-side langage to customize it with fields from a prior webform, and output the result. (Can easily be done with ruby/django/php, there are some good pdf libraries out there.)
First, abandon pixels. What you're looking for is a print stylesheet, with everything specified using physical units (cm/inches), font size in pt, etc. What is displayed on the screen, in what font size, and whether it is pixel-perfect or not doesn't seem relevant to your requirement of producing a scannable copy.
The question is now, is IE6 support for physical units and print stylesheets complete enough for that? Given my experience with making print stylesheets for clients, where IE would simply crash during the print process if you looked at it wrong, I'd say not too likely -- not with the complexity of the forms you're dealing with.
If you're worried about the renderer (IE, Acrobat, etc.) screwing up, you could always render the form on the server, and just serve an image to the user.
Dean, check out Prince. Bert Bos and Håkon Wium Lie used it for production of their book on CSS. They explain a bit about it in an A List Apart article.

Do you know a webpage appearance comparator?

I need a tool to compare the design of a website, I do not want to compare the HTML code only, but the output design.
Is this even possible? also is there any opensource program of this kind?
I have searched google, but I only get one candidate so far which is an HTML Match.
In modern webpages the appearance is controlled by various 'things': html code, css styles and images at least (also javascript in some pages). Simple text-based diff programs are not enough because their output can be irrelevant to the webpage appearance (i.e. cleaning up css can show many differences but the rendered webpage remains the same).
For simpler pages HTML Match mentioned above could do the job. If I have to compare the design of two "complex" pages (including layout, space, image and text changes) I would do a two-step approach:
Run a diff tool on the html sources to highlight the textual content differences. Then I would modify one of the pages to show the same content as the other (in order to make the next step more accurate and 'focused' to show 'real' layout changes). Of course it works only with very similar html.
Load the pages in the same web browser, get some screenshots from the rendered output at fixed positions and compare the images (i.e. with ImageMagick). It should show all visual differences in the rendered output.
It is not perfect but should work.
[UPDATE] HTML Match seems dead, see this answer for an alternative solution.
Solution: “compare web pages” tool. (“We've been doing it since 1999. It's free.”)
Example output (comparing pages for TP-Link USB hub model UH700 and UH720):
Under windows:
http://www.htmlmatch.com/
If you are using KDE, you can use Kompare or KDiff3.
However, if you want to view how your web page looks in different browsers in different operating systems, BrowserShots can used.
There are these online tools - that aren't brilliant:
http://www.w3.org/2007/10/htmldiff
http://www.aaronsw.com/2002/diff/
I like the look of daisydiff but have not used it in anger: http://code.google.com/p/daisydiff/
The keyword you're looking for is "diff".
A good program that can show you the differences between two files (html markup or other) would be ExamDiff for windows.
I'm working on one and i tell you it's hard and there is nothing on the market. Maybe Google and Bing have something inhouse. You can use some image comparison tools which identify rectangle regions of changed images. This is for example a part of all modern video compression but you have to do it for different regions of the webpage (the nav bar section, the main article, the region filtered by an ad-blocker etc.) as some of them may change and it's still considered the same content on the page.
As i said very complex problem with no exact solution.
The other is going the non visual way and just compare the resulting computed computer styles of each html element. You have to hack the browser to get access to the layout tree. There is also no official API or existing library/program/hack/patch for it.
You can make a visual comparison with Araxis Merge Pro by taking screen output with systems like BrowserStack, Cross Browser, PhantomJS

HTML or RTF to display columns of text with color-coded words

In my Delphi program I want to display some information generated by the application. Nothing fancy, just 2 columns of text with parts of words color-coded.
I think I basically have two options:
HTML in a TWebbrowser
RTF in a TRichEdit.
HTML is more standard, but seems to load slower, and I had to deal with The Annoying Click Sound.
Is RTF still a good alternative these days?
Note: The documents will be discarded after viewing.
I would vote for HTML.
I think it is more future oriented. The speed would not concern me.
The question of HTML or RTF may be irrelevant. If they are just used for display purposes, then the file format doesn't matter. It's really just an internal representation. (Are any files even being saved to disk?) I think the question to ask is which one solves the problem with the least amount of work.
I would be slightly concerned that the browser control is changing all the time. I doubt the richedit control will change much. I would lean towards the richedit control because I think there is less that could go wrong with it. But it's probably not a big deal either way.
Have you considered doing an ownerdraw TListView?
I'd also use HTML. Besides, you just got an answer for the clicking sound in TWebBrowser.
If you'd rather not use TWebBrowser, take a look at Dave Baldwin's free HTML Display Components.
I would vote for HTML, too.
We started an app a while ago...
We wanted to
display some information generated by the application. Nothing fancy, just...
(do you hear the bells ring???)
Then we wanted to display more information and style it even more....
...someone decided, that RTF isn't enough anymore, but for backwards compatibility we moved on to MS Word over OLE-Server. That was the end of talking about performance anymore.
I think if we would have done that in HTML it would be much faster now.
RTF is much easier to deal with, as the TRichEdit control is part of every single Windows installation, and has much less overhead than TWebBrowser (which is basically embedding an ActiveX version of Internet Explorer into your app).
TRichEdit is also much easier to use to programmatically add text and formatting. Using the SelStart and SelLength, along with the text Attributes, makes adding bolding and italics, setting different fonts, etc. simple. And, as Re0sless said, TRichEdit can easily be printed while TWebBrowser makes it more complicated to do so.
I would vote RTF as I dont like the fact TWebBrowser uses Internet explorer, as we have had trouble with this in the past on tightly locked down computers.
Also TRichEdit has a print method build in, where as you have to do all sorts of messing about to get the TWebBrowser to print.
Nobody seems to have mentioned a reporting component yet. Yes, it is overkill right now, but if you use it anyway (and maybe you already have got some reporting to do in your app, so the component is already included) you can just display the preview and allow to print / export to pdf later, if it makes any sense. Also if you later decide that you want to have a fancier display there is nothing holding you back.
If both HTML and RTF won't satisfy your need, you could also use an open source text/edit component that supports coloring words or create your own edit component based on a Delphi component.
Another alternative to the HTML browser is the "Embedded Web Browser" components which I used a few projects for displaying html documents to the user. You have complete control over the embedded browser, and I don't recall any clicks when a page is loaded.
I vote for HTML also
RTF is good only for its editor, else then you'd better go standard.
RTF offers some useful text editing options like horizontal tabulator which are not available in HTML. Automatic hyperlink detection is also a nice extra. But I think I would prefer HTML, if these features are not required.
I vote for HTML.
Easier to generate programmatically.
Widely supported.
Since you don't need WYSIWYG capabilities I think HTML advantages trump RTF. Moreover, should the need to export generated data for further, WP-like editing arise, remember that major word processor can open and convert HTML files.
Use HTML, but with 'Delphi Wrapper for Chromium Embedded' by Henri Gourvest , Chromium embedded uses the core that powers Google Chrome.
Don't use TWebBrowser, I'm suffering from all programs that use IE's web control - the font is too small on my 22' monitor with a 1920x1080 resolution, I use Windows 7 and my system's DPI is 150% (XP mode), I tried everything to tweak trying to fix that, no luck...