How are PDF sizes specified? - html

I am generating a PDF from HTML using a library, and all the size parameters that I am giving are in pixels. This seems kind of odd. I just googled the internet for A4 size in pixels, and can I just use these values everywhere?
Is this how it should be done? Will the generated PDF look correctly?
Otherwise, do I need to somehow compute the pixel size using information from the screen?
Then, how do PDF's work if they can be sent to others and still look comparatively the same?

PDF internally uses the same graphics model as PostScript. PDF is derived from PostScript. Basically,...
...it uses the very same operators that are available in PostScript, but renames them from being long and fully readable to short 1-, 2- or 3-letter abbreviations;
...however, it strips all features that make PostScript a full-blown programming language;
...and it adds a few new graphics capabilities, such as tranparencies and direct TrueType fonts embedding.
PDF also uses the same basic measurement unit as PostScript: 72 points == 1 inch. You may also use fractions of points. This is the device independent way of stating dimensions.
If you ever use pixels, you can do so. If you do, the absolute size of a graphic object on the display or the printed paper then depends on the current resolution of the display or printer. A square of 72px x 72px is 1inch x 1inch at a 72dpi resolution, but it is 0.1inch x 0.1inch at a 720dpi resolution. Therefore using pixels is a device dependent way of stating dimensions.
A4 dimensions are 'width x height = 595 x 842 pt'.

PDF's inherently a print medium, and its internal coordinates work in terms of 'points' (72pts per inch). The PDF rendering software (Acrobat, FoxIt, Ghostscript, etc...) will query the output device for its DPI rating and internally convert all the point-basec coordinates into device-specific pixel sizes when it comes time to render the PDF for display/print.
You can specify sizes in pixels while building a PDF, certainly. But remember that pixel sizes differ. A 300x300 pixel image will be 1" x 1" square on a 300dpi printer, but 3" by 3" on a 100 dpi monitor.

I sometimes edit my files in GIMP, when I want to edit my pictures and such. When I export for PDF it depends on the quality I need.
I just use this chart (for A4):

Related

How to use Capabilities in Air AS3 to determine correct dpi

I am currently looking into the best practices of implementing images and graphics to a new Adobe air project and would like to know which is the best practice in using images . I have a fair bit of experience in Flash with web development and am familiar with a lot of the standard coding issues however as I'm new to integrating data into the mobile domain would like to ensure I'm doing it in the best possible way as the information available is often contradictory to tutorials and existing material online.
I am currently building the project with the following : Adobe Air, Flash CC for Android, testing on a Samsung g6.
I have my main app class and so now have set a class to determine the system Capabilities etc and now have access to the following information to pre plan my layout etc :
dpi : 640
stageResolution : 1440 width x 2464 height.
Adobe Templates for android as3 projects are 480 x 800.
Is it wise to stick with this as a size even though my target app is a higher resolution and to create all images and MovieClips to this size and to integrate a scaling mechanism for lower/higher resolution, or is it common practice to keep to the 480 x 800 template and allow for all resizing options within the code?
Having trawled through a number of links and articles on how the Capabilities doesn't faithfully cover the exact sizes and specifications etc, what dpi is best for images being loaded into the the app?
For background images and gallery images, ie splashScreen embedded images etc, what dpi is best used for maximum quality?
I loaded a 1440 x 2464 *.jpg #72 dpi and it filled the screen perfectly on mobile, I also loaded a 1440 x 2464 *.jpg #640 dpi and couldn't notice any difference so is this effectively not worth worrying about?
Should I just use relative images to size at 72 dpi for everything? Backgrounds, buttons etc, and or resize or bitmapData with them when I add them to the stage.
Example, I create a series of new images in photoshop, set them to 1 cm x 1 cm for a button in the app. I create the same in 72, 160, 640 dpi for variation and to see the difference.
I load them all into my project as is next to each other.
The 72 dpi 1cm x 1cm = Actual size on screen, just over 1 mm.
The 160 dpi 1cm x 1cm = Actual size on screen, just over 2.5 mm.
The 640 dpi 1cm x 1cm = Actual size on screen, just over 1.1 cm.
Clearly the definition and size is relative, the 640 dpi image is almost right to the scale on screen despite it being slightly larger than the size of the image I created, only 1mm but ok, but if I were to plan my layout using cm/mm in photoshop or whatever program, it would fail miserably in regards to layout if I relied on these equations!?
I sourced links like this http://www.dallinjones.com/ whereby it helps give an insight into conversion etc, however it still doesn't add up as it should.
According to the Pixels to Millimeters algorythm:
mm = ( pixels * 25.4) / dpi;
My equation in as3 amounts to the following :
1440 x 25.4 = 36,576 / 640 = 57.15mm
The actual screen size of the phone is 63+ mm !?
Also I've seen that vector graphics aren't as effective when being used on mobiles.
I've seen documentation suggesting that vectors need to be set to 'cache As Bitmap' also, is this for all vectors and drawn graphics?
Say I want to dynamically draw a background panel with rounded corners and a gradient that fills the stage.
drawBackgroundPanel(0, 0, stage.stageWidth, stage.stageHeight).
I create the rectangle, draw add backgroundPanel to stage. Do I then have to cache as bitmap also?
Thank you for your help.

Why is DPI relevant for images taken by a camera for OCR

I am currently working on a project that involves using the Tess4j Tesseract OCR engine.
While working on this project I come along a lot of websites that state that Tesseract works best on images of at least 300 DPI (Dots per Inch).
My question is why is DPI mentioned so many times for images. I understand that when you scan an object that you want to scan it with at least 300 DPI. I just cannot figure out why this is relevant for pictures taken with a camera.
DPI is as far as I know a property for the printer. Based on this property the higher it is the smaller the image but with greater quality.
Now if DPI has nothing to do with these images than I am wondering why the results on my program differs when I change the DPI property of images between 72 & 300.
Is there a pre-process of Tesseract that I am unaware of?
Actually, it is the text size at a specific DPI.
Is there a Minimum Text Size? (It won't read screen text!)
There is a minimum text size for reasonable accuracy. You have to
consider resolution as well as point size. Accuracy drops off below
10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count
the pixels of the x-height of your characters. (X-height is the height
of the lower case x.) At 10pt x 300dpi x-heights are typically about
20 pixels, although this can vary dramatically from font to font.
Below an x-height of 10 pixels, you have very little chance of
accurate results, and below about 8 pixels, most of the text will be
"noise removed".
https://github.com/tesseract-ocr/tesseract/wiki/FAQ#is-there-a-minimum-text-size-it-wont-read-screen-text

pdf2json Page Unit: What is it?

I'm try to use modesty/pdf2json and the output is very useful, but i'm try to figuring the measure units that the library uses. They call it "Page Units", and according to the pdf specs, this is'nt equal to the 1/72 (point), because an entire page has 51 Page Units on height
Anybody knows what is this Page Unit? Where i can find info about this measurement?
Many thanks in advance.
TL;DR
The important thing to understand is that x,y and element width/height are relative units that are related to page width/height by a ratio that can be translated to any destination ratio by dividing by the existing units and multiplying by the desired units.
Here are the boring details:
PDF's don't have a standard "size" -- you can print anything you like to PDF which may include landscape or portrait orientation, different page sizes (Standard, A0-A5, Legal, Tabloid, Custom), etc. The size of a PDF is in inches so the translation to pixels (including with pdf2json) is not a fixed "24px" as indicated in #async5's answer.
The key to programmatically getting the results you want is to utilize the parsed PDF information (page width and page height) along with how you need to render it (pixel count varies by density of display resolution but an "inch" is always an "inch") and how that translates to the destination resolution you're targeting.
Since the same physical device often supports multiple resolutions (changing the logical DPI) - there may be a difference between the native pixel density and the synthesized density set by the user and so the basis for translating from PDF Units to a local display is going to be a scale factor that's made up of the difference between the PDF file and the target dpi of the physically rendered version of it. This same idea applies with PDF parsing libraries which may use a different DPI than the native "72dpi" of the pdf file itself.
While 96dpi is the Microsoft standard size (72dpi is Apple's standard), the choice of either doesn't give you a correct pixel offset b/c pdf2json or pdf.js don't know anything about the end-user display. For pdf2json coordinates (x/y) they are simply relative measurements between a position on a plane (which is defined by a width/height). So standardized to a 8.5"x11" position with 72dpi would be done as follows:
pdfRect.x = pdfRect.x * ((8.5 * 72) / parsedPdf.formImage.Width);
pdfRect.y = pdfRect.y * ((11 * 72) / parsedPdf.formImage.Pages[0].Height);
This kind of formula would work no matter what pdf2json's internal DPI is -- or frankly whatever other PDF parsing library you choose to use. That's because it cancels out those units by division and multiplying using whatever units you need. Even if today pdf2json internally uses 96dpi and downscales by 1/4 and later changes to 72dpi and downscaling by 1/2 the math above for converting to the pixel offset and dpi would work independent of that code change.
Hope this is helpful. When I was dealing with the problem it seemed the Internet was missing a spelled out version of this. Many people solving specific concrete source/destination resolution issues (including specific to a library) or talking about it in the abstract but not explaining the relationship very clearly.
Whatever pdf2json produces is not related to the PDF.js (PDF.js uses standard PDF space unit as a base)
So based on https://github.com/modesty/pdf2json/blob/3fe724db05659ad12c2c0f1b019530c906ad23de/lib/pdfunit.js :
pdf2json gets data from PDF.js in 96dpi units
scales every unit by 1/4
So page unit equal (96px/inch * 1inch / 4) = 24px.
In your example height is equal 51 * 24px = 1,224px, or 51 * 0.25inch = 12.72inch

Save Canvas as PNG for print

I know I can save a canvas to PNG file easily with modern browsers. As it's a standard way for a browser to save canvas graphics as PNG, JPEG or BMP, I suppose it should work really good. I wonder how I'd save some canvas graphics to print it later? I mean if I use standard methods, I will get the image that is the same in size as source canvas and with a low-res 72 dpi or something like that. Should I make the canvas larger, then save a large image, and then convert it to 300dpi for print? Did anybody of you try to use it for print? I know I can use some pdf generator library but want to try standard ways first.
Yes, make the canvas larger and save a large image.
HTML5 canvases have no sense of DPI — one pixel on the canvas equals one pixel on your screen. The quality of the print depends on what you're printing (aliased vs anti-aliased graphics) and type of printer (ink jet, laser).
If you wanted 300 DPI exactly, use something like a screen ruler and measure the DPI of your monitor (say, 72 DPI), divide 300 by that (equalling, say, 4.1) and make the canvas that many times bigger.
Alternatively you could think about using SVG and drawing graphics with vectors. Then you'd effectively have infinite DPI. (Think of Adobe Illustrator vs. Photoshop.)

HTML elements physical size

I've just gotten into web development seriously, and I'm trying to make a page that appears the same physical size ( in inches ) across all browsers and platforms
I believe a combination of percentage values and inch values can make a consistent UI.
my own system is a 15.4 inch screen with 1920x1200 pixels i.e. 144 DPI.
Here is the most simple HTML code that fails to appear the right size on any browser except FireFox (Tried on Chrome 3, 4, Opera 10.5, IE7)
<html><head>
<body>
<div
style="position:absolute; width:2in; height:1in; border:1px solid" >
hello world</div>
</body></html>
Chrome, Opera and IE render a .67 inch box ( They seem to be assuming a 96 DPI screen )
I am running Windows XP, but I see no reason why that would make a difference. Similar incorrect rendering on other machines I have tested.
I thought when I say "1in" in HTML it means one actual inch in the real world....
How do I handle this?
Thanks in advance,
Vivek
Edit :
in 2006 I developed an activex control which did live video editing for a website, in 2008 we started seeing lots of Vista use and higher DPI screens which made the UI unusable, I reworked the application so that everything scaled according to DPI, since then everyones happy that they don't need glasses to use the feature....
The whole reason that Win7 and Vista have this "DPI scaling" mode is to allow non-DPI aware apps to display ( but since it basically scales up the app's canvas, the apps look blurry ).
I can't believe that calling GetDeviceCaps() or the X-Windows equivalent is harder than hardcoding 96 DPI. Anyway it wouldnt affect any page that measures in pixels....
Can't be done. Period.
Screens are a grid of pixels and that is the only unit recognized by the physical display. Any other measurement must be converted to pixels for display.
This conversion is done by the operating system's graphics subsystem, which has an internal "DPI" setting - quoted because it's arbitrary and does not necessarily correspond to the actual physical DPI of any real-world device. Windows, for example, defaults to 96 DPI regardless of the display that it's connected to.
When you look at a page with something with a height of "1in", your machine looks at it, calculates that, since it's set for 144 DPI, "1in" means 144 pixels and makes it 144 pixels tall, regardless of the physical distance that those 144 pixels will occupy on your display.
When the typical Windows user with the default 96 DPI setting looks at it, their computer calculates that "1in" = 96px and makes it 96 pixels tall, again regardless of the physical distance that will correspond to those 96 pixels.
And you know what? This is a good thing. It means that a user with poor vision can lower their video driver's DPI setting and everything sized in inches, cm, point, em, or other 'real-world' units (i.e., basically anything other than percent or pixels) will get bigger so that they can see it more easily. Conversely, those of us who don't mind small text can artificially increase our DPI settings to shrink it all and get more onto our screens.
The tyranny of the printed page, forcing things to be a certain size whether we like it or not, is no more. Embrace the power this gives your users over their experience. Don't try to take that away - you won't succeed and you'll just annoy people in the process.
Instead of giving the size in px you just give it in percentage . so that it can fit on any screen based on the percentage .
I don't really think you can, to be honest.
The DPI of the screen is determined in hardware, i.e. a 15.4" Screen with 1920x1080 resolution = 144DPI.
Why do you need the site to appear as the same physical dimensions? Surely the same size, proportional to the screen is enough? i.e. if my resolution is 1920x1080, your site should take up 50% of the wide. If I'm using 1600x1050, it should take up 60%?
In short — you can't, at least for use on screen.
Using real world units depends on having clients knowing the correct DPI for the system. Between clients that don't bother, and systems which are not configured correctly, it is a hopeless situation.
If you want something to scale so it is a reasonable size for the users system, then you are likely best off using em units to scale based on the user's font size. This still has problems where the defaults are not suitable and the user hasn't changed them, but it is right more often than physical units.