Using TET to get Font List from PDF returns length:fonts = 0 - pdflib

This always returns 0 even though the PDF has several embedded fonts.
Im using TET4.1 PHP 5.3 Linux
$fontCount = (integer) $tet->pcos_get_number($doc, "length:fonts");
All other calls using $tet->pcos_get_number($doc, "xxxx") work fine.

without knowing details I can only guess:
the document contain the fonts only within form fields
no fonts are used and the visual text comes from raster images
I guess in this case, it might be good recommendation to contact the vendor directly.
http://www.pdflib.com/licensing-support/opening-a-support-case/

Related

Way To Modify HTML Before Display using Cocoa Webkit for Internationalization

In Objective C to build a Mac OSX (Cocoa) application, I'm using the native Webkit widget to display local files with the file:// URL, pulling from this folder:
MyApp.app/Contents/Resources/lang/en/html
This is all well and good until I start to need a German version. That means I have to copy en/html as de/html, then have someone replace the wording in the HTML (and some in the Javascript (like with modal dialogs)) with German phrasing. That's quite a lot of work!
Okay, that might seem doable until this creates a headache where I have to constantly maintain multiple versions of the html folder for each of the languages I need to support.
Then the thought came to me...
Why not just replace the phrasing with template tags like %CONTINUE%
and then, before the page is rendered, intercept it and swap it out
with strings pulled from a language plist file?
Through some API with this widget, is it possible to intercept HTML before it is rendered and replace text?
If it is possible, would it be noticeably slow such that it wouldn't be worth it?
Or, do you recommend I do a strategy where I build a generator that I keep on my workstation which builds each of the HTML folders for me from a main template, and then I deploy those already completed with my setup application once I determine the user's language from the setup application?
Through a lot of experimentation, I found an ugly way to do templating. Like I said, it's not desirable and has some side effects:
You'll see a flash on the first window load. On first load of the application window that has the WebKit widget, you'll want to hide the window until the second time the page content is displayed. I guess you'll have to use a property for that.
When you navigate, each page loads twice. It's almost not noticeable, but not good enough for good development.
I found an odd quirk with Bootstrap CSS where it made my table grid rows very large and didn't apply CSS properly for some strange reason. I might be able to tweak the CSS to fix that.
Unfortunately, I found no other event I could intercept on this except didFinishLoadForFrame. However, by then, the page has already downloaded and rendered at least once for a microsecond. It would be great to intercept some event before then, where I have the full HTML, and do the swap there before display. I didn't find such an event. However, if someone finds such an event -- that would probably make this a great templating solution.
- (void)webView:(WebView *)sender didFinishLoadForFrame:(WebFrame *)frame
{
DOMHTMLElement * htmlNode =
(DOMHTMLElement *) [[[frame DOMDocument] getElementsByTagName: #"html"] item: 0];
NSString *s = [htmlNode outerHTML];
if ([s containsString:#"<!-- processed -->"]) {
return;
}
NSURL *oBaseURL = [[[frame dataSource] request] URL];
s = [s stringByReplacingOccurrencesOfString:#"%EXAMPLE%" withString:#"ZZZ"];
s = [s stringByReplacingOccurrencesOfString:#"</head>" withString:#"<!-- processed -->\n</head>"];
[frame loadHTMLString:s baseURL:oBaseURL];
}
The above will look at HTML that contains %EXAMPLE% and replace it with ZZZ.
In the end, I realized that this is inefficient because of page flash, and, on long bits of text that need a lot of replacing, may have some quite noticeable delay. The better way is to create a compile time generator. This would be to make one HTML folder with %PARAMETERIZED_TAGS% inside instead of English text. Then, create a "Run Script" in your "Build Phase" that runs some program/script you create in whatever language you want that generates each HTML folder from all the available lang-XX.plist files you have in a directory, where XX is a language code like 'en', 'de', etc. It reads the HTML file, finds the parameterized tag match in the lang-XX.plist file, and replaces that text with the text for that language. That way, after compilation, you have several HTML folders for each language, already using your translated strings. This is efficient because then it allows you to have one single HTML folder where you handle your code, and don't have to do the extremely tedious process of creating each HTML folder in each language, nor have to maintain that mess. The compile time generator would do that for you. However -- you'll have to build that compile time generator.

Text heavy iOS App. Store text in HTML, Plist, or Other?

I'm writing relatively complex iOS app that is very text heavy.
The text is also heavily formatted. It has lots of color, size, font, and spacing changes, as well bulleted lists and other text features you'd expect to see in a very rich website.
The text is displayed on about 40 different views. Some of which display a lot of text, others a little. There is no one template that all the pages follow. (There are some that are similar, but that's not the point.)
Lastly, the text is constantly being changed and updated by an editorial team during development, not so much after release. The text has to be stored on the device, downloading files is not an option.
My question is, what is the best way to store and then render all this text in an iOS App?
My approach
Store all the text content and formatting info in an html file and use
[[NSAttributedString alloc] initWithFileURL:htmlDoc
options:#{
NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType}
documentAttributes:&attrDict
error:&error];
to create a NSAttributed string and use that to populate UITextViews.*
*Note: I would do some more work before creating the UITextViews. First I would parse it to find the appropriate page number [[Page:1.3]] and then parse the elements in that section [[header]], [[side_scroller]], etc...
I like this approach for two main reasons:
It created a separate copy document that contained all the text
and formatting info.
I'm the only iOS developer, but we have a couple front-end
developers. So when we get slammed with changes that need to be done
in 3.45 minutes, I could have some of the guys help me make the
changes, without having to know all the nuances of UIFont and
related classes. Occasionally, the editors could even make the
changes themselves :)
Minor reasons for liking this approach:
The text can vary so much per page, that creating a new UIFont + Plist entry to store the formatting info seems like a bigger pain than having everything in a .html document. (I could be wrong about this.)
Project managers will inevitably say: "Make this word a little bigger," "This word looks strange, add italics," and "Make everything purple!" HTML/CSS seems like a more flexible solution for quickly implementing these requests.
Downsides of this approach:
NSAttributedString picks up 99% of the HTML attributes I threw at it. It did not pick bullet spacing changes in unordered lists <ul>.
Plists are more performant.
Here are some other approaches I considered:
Plist + UIFont
RTF Document - Originally started with this, but found it hid a lot of what was going on and NSAttributedString wouldn't pick up some of the changes.
XML
Any advice or input would very appreciated.
Notes:
iPad app,
iOS 7,
No Internet Connectivity,
Xcode 5
What I did to store styled text in an iOS app was to write a Mac OS command line tool that opens RTF files and converts them to attributed strings (It's a 1-line call in Mac OS, but not supported in iOS for some reason.) I then use NSCoding to save the attributed strings as binary data, with a special .DATA filetype.
I created a custom UITextView category with a method that knows how to load the text view's attributed text from my custom filetype.
I created a build rule in my project that treats RTF files as source files in a build step and the .DATA filetype as the output, and copies the .DATA files into the build project.
Now, all I have to do is add an RTF file to my project the build process inserts the .DATA version of the styled text into the executable.
The Xcode editor knows how to edit RTF files, so you can edit them right in place in the IDE, OR you can edit them in TextEdit or any editor that supports RTF files.
There are a few things you can put in an RTF that aren't supported in UITextViews. (I don't remember what those are offhand. Sorry.)
I find styled WYSIWYG text much easier to deal with than HTML. You just edit the text, and the build process picks up the changes.
It worked beautifully. Plus, binary NSCoding output is a whole lot more compact than HTML.
I would recommend using web view. It can open files in resource bundle.
You can disable all the links in HTML by implementing delegate method shouldStartLoadWithRequest to return NO.
You might also want to set dataDetectorTypes to UIDataDetectorTypeNone.
That will disable auto link detection in web view

dynamic HTML page to pdf

I know there is a list of similar questions but all handle pages without user interaction (static even though some js may be there).
Let's say we've a page the user can interact (e.g. svg than changes, or html tables with drilldown - content changes). Those interactions will change the page. Same happens in stackoverflow when entering the question...
The idea is adding a button, "convert to pdf" taking the state of the html and sending to the user back a pdf version (we've a Java server).
Using the print of the browser is not the answer I'm looking for :-).
Is this a stick in the moon ?
You would have to store the parameters that generate the HTML view (i.e. what the user clicks on, what selections they make, etc). If you can have a list of parameters that generate the HTML view, you can have a method which accepts the list of parameters (JSON post?), generates the HTML view and passes it to your PDF generating routine. I'm not too familiar with Java libraries for this purpose, but PHP has TCPDF can take html output to basically generate a PDF for you. Certainly, there are Java libraries which will allow you to do the same thing, or you can use the parameters to get a list of rows/arrays which can be iterated over and output using the PDF library of your choice.
Both iTextPDF and Aspose.PDF would allow you to do that (I've seen them used in two different projects), but there is no magic and you will have to do some work.
The steps are roughly:
Get (as a string) the part of the document which you want to print with jQuery or innerHTML
Call a service on the server side to convert this to PDF
[Serverside] Use a whitlist - based tool to clean up the hmtl (unless you want to be hacked). JSoup is great for that.
[Serverside] Use IText or Aspose API to create the PDF from the HTML (this is not trivial, you will have to read the doc)
Download the document
I'd also recommend DocRaptor, an HTML to PDF API built by my company, Expected Behavior.
DocRaptor uses Prince XML to generate PDFs, and thus produces higher quality results than similar products.
Adding PDF generation to your own web application using our service is as simple as making an HTTP POST request to our server.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our API documentation:
DocRaptor API documentation

Is it possible to have a html code and all images in one file?

I want to have a html file with javascript. Then I want to have some images in this file. I want to send this html file to my friends (per e-mail). I want them to see my html file with images but I do not want to send them all files with all images. It would be nice to send them just one file.
I also do not want to have images on a web-server.
I also do not want to send them an archive with all the files (since they then need to open this archive).
Do I want to much or it's possible to do what I want?
ADDED
I do not want my friends to see the html file in a mail-client. I want to send a file as an attachment. So, they can save it and then open with a browser.
Yes, it is possible:
# HTML
<img src="................." />
# CSS
background-image: url(.................)
File source is encoded using Base64 algorithm that allows easily represent binary data as normal text.
Find out more on wikipedia: Data URI scheme.
Depending on whether the mail client supports it, you could in theory use the data URI scheme, like so:
<img src="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
vr4MkhoXe0rZigAAAABJRU5ErkJggg==" alt="Red dot" />
Again, the support is mail client dependent. Some might not support it at all. Some might truncate after a X amount of bytes. Etcetera. As far as I know there aren't many of them. Further I don't see another ways to inline images in HTML like that. Until the support is widespread, your best bet is really to send the images along as an attachment.
Update as per the OP's update: well, most of the modern webbrowsers supports it. The aforementioned Wikipedia link even mentions them in detail.
Data URIs are currently supported by the following web browsers:
Gecko-based, such as Firefox, XeroBank, Camino, Fennec and K-Meleon
Konqueror, via KDE's KIO slaves input/output system
Opera (including devices such as the Nintendo DSi or Wii)
WebKit-based, such as Safari (including on iPhones), Android's browser, Epiphany and Midori (WebKit is a derivative of Konqueror's KHTML engine, but Mac OS X does not share the KIO architecture so the implementations are different), as well as Webkit/Chromium-based, such as Chrome and Iron
Internet Explorer 8: Microsoft has limited its support to certain "non-navigable" content for security reasons, including concerns that JavaScript embedded in a data URI may not be interpretable by script filters such as those used by web-based email clients. Data URIs must be smaller than 32 KiB.
Note that IE8 truncates the string after 32KB. So, as long as the images aren't that large, you could use the data URI scheme for IE8 users. It's not supported on IE7 and lower.
I am not aware of a way to accomplish what you're after with 100% certainty it will work.
Is there a way to forgo the images? Perhaps an ascii representation instead? (something like this http://www.text-image.com/)
The archive would be the only "single file" option that I'm aware of.
You cant execute javascript from a mail client. You can inline the images, but you will need a library because doing it by hand is non-trivial.
You should just send them a link.
Why don't you just link the images with relative paths, and bundle them in a folder with the html file and send it archived and compressed (zip or tarball, depending on preference)?
If you just want to send one file, just zip it using your favorite compression program.
You should never, under any circumstances, send email whose body is HTML. Send plain text mail with the images as MIME attachments, or better yet, put the images on a website (I hear Flickr is quite good ;-) and send them URLs.
I'm going to say it again, because it needs to be said more often: email must be plain text.

Outlook HTML Mail - changing linked items to embedded

I'm attempting to send HTML formatted emails using C# 3 via Outlook.MailItem
Outlook.MailItem objMail = (Outlook.MailItem)olkApp.CreateItem(Outlook.OlItemType.olMailItem);
objMail.To = to;
objMail.Subject = subject;
objMail.HTMLBody = htmlBody;
The email is generated externally by saving from an RTF control (TX Text Control), which yields HTML with links to images stored in a <<FileName>>_files subdirectory. Example:
<img border="0" src="file:///C:/Documents%20and%20Settings/ItsMe/Local%20Settings/Temp/2/zbt4dmvs_Images/zbt4dmvs_1.png" width="94" height="94" alt="[image]">
Sending the email this way generates a mail with broken links.
Using Outlook 2007 as the email client with Word as the email editor, switching to RTF (Options tab, Format tab group) preserves the layout and inlines the images.
Programmatically doing this via:
var oldFormat = objMail.BodyFormat;
objMail.BodyFormat = Outlook.OlBodyFormat.olFormatRichText;
objMail.BodyFormat = oldFormat;
loses the formatting and mangles the images (the image becomes a [image] link marker on screen which is clickable but no longer shows the image). This isn't a surprise given that the documentation for MailItem.BodyFormat Property says "All text formatting will be lost when the BodyFormat property is switched from RTF to HTML and vice-versa".
Sadly there doesnt seem to be an easy way to change the Type of each Attachment in the MailItem.Attachements to OlAttachmentType.olByValue, as it's a read-only property that's set when you create the Attachment.
An approach that comes to mind is to walk the HTML, replacing the <img> tags with markers and programatically walking the MailItem text, inserting an Outlook.Attachment of Type OlAttachmentType.olByValue.
Another option is to convert the <img> links to use src="cid:uniqueIdN" and add the images as attachments with the referenced identities.
So, to the question... Is there a way to get the linked images converted to embedded images, ideally without getting into third party tools like Redemption? Converting to RTF happens to yield the outcome, but doing it that way is by no means a pre-requisite, and obviously may lose fidelity - I Just Want It to Just Work :D Neither of my existing ideas sound Clean to me.
Since you are using .net > 2.0, you may want to look into the System.Net.Mail namespace for the creation of mail messages. I have found that its quite customizable and was very easy to use for a task similar to yours. The only problems that I had was making sure I was using the right encoding, and I had to use HTML tables for layouts (css would not work right). Here are some links to show you how this works...
Basic
With multiple views (Plain Text and HTML)
If that's not an option, then I would recommend going the Content ID route and embedding the images as attachments. Your other option is to host the images publicly on a website, and change the image links in the html to the public images.
Something that you should be cognizant about is that HTML emails can easily look like spam and can be treated as such by email servers and clients. Even ones that are just for in-house usage (its happened to me) can end up in Outlook's Junk Mail folder..
DOH!, actually sending the email in Outlook 2007 forces the images to become embedded.
The Sent Item size of 8K is a lot smaller than the draft size of 60K (RTF) I was seeing vs the draft size of 1K (HTML that hadn't been converted to RTF and back again).
So it was Doing What I Mean all the time. Grr.
I'll leave the Q and the A up here in case it helps someone of a similarly confused state of mind.
BTW some useful links I found on my journey:
Sending emails example
General Q&A site with other examples of varying quality