Parse and Display Table from HTML String in NSMutableAttributedString - html

I have been in the process of converting my Android/Java app into iOS/Swift, and have run into an issue regarding table generation.
The app displays a list of content pulled from a web source as html strings, and displays these in a UITableView (RecyclerView on Android). Creating an NSMutableAttributedString with NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType works fine for basic html formatting (bold, italics, lists, etc), but I need to render tables as well.
In my Android app, I created blocks of content split at tags, and if it was a table, populated a TableView and inserted it into the correct place in the text using a LinearLayout.
The cells are programmatically generated in Swift, so I pre-process the HTML when pulled from the web and store the NSMutableAttributedStrings in an array along with the estimated heights in order to make scrolling smooth. I ran across the NSTextTable in Apple's documentation, but cannot seem to figure out how to use this class, and whether they can be embedded in an NSMutableAttributedString in order to access the table later.
If I'm totally misunderstanding what that class does, please let me know.

Related

How does header and footer printing work in Puppeter's page.pdf API?

I've noticed a few inconsistencies when trying to use the headerTemplate and footerTemplate options with page.pdf:
The DPI for headers and footers seems to be lower (72 vs 96 for the main body, I think). So if I'm trying to match the margins, I have to scale by that.
Styles are not shared with the main body so I have to include them in the template.
If I try to use a locally stored font, it works on the main body but not in the header/footer even if I include the same CSS in the header/footer template.
I suspect that this happens because headers and footers are treated as separate documents and converted to image/pdf separately (https://cs.chromium.org/chromium/src/components/printing/resources/print_header_footer_template_page.html also implies something like that). Can someone familiar with the implementation explain how it actually works? Thanks!
Short Answer:
Puppeteer controls Chrome or Chromium over the DevTools Protocol.
Chromium uses Skia for PDF generation.
Skia handles the header, set of objects, and footer separately.
Detailed Answer:
From the Puppeteer Documentation:
page.pdf(options)
options <Object> Options object which might have the following properties:
headerTemplate <string> HTML template for the print header. Should be valid HTML markup with following classes used to inject printing values into them:
date formatted print date
title document title
url document location
pageNumber current page number
totalPages total pages in the document
footerTemplate <string> HTML template for the print footer. Should use the same format as the headerTemplate.
returns: <Promise<Buffer>> Promise which resolves with PDF buffer.
NOTE Generating a pdf is currently only supported in Chrome headless.
NOTE headerTemplate and footerTemplate markup have the following limitations:
Script tags inside templates are not evaluated.
Page styles are not visible inside templates.
We can learn from the the Puppeteer source code for page.pdf() that:
The Chrome DevTools Protocol method Page.printToPDF (along with the headerTemplate and footerTemplate parameters) are sent to to page._client.
page._client is an instance of page.target().createCDPSession() (a Chrome DevTools Protocol session).
From the Chrome DevTools Protocol Viewer, we can see that Page.printToPDF contains the parameters headerTemplate and footerTemplate:
Page.printToPDF
Print page as PDF.
PARAMETERS
headerTemplate string (optional)
HTML template for the print header. Should be valid HTML markup with following classes used to inject printing values into them:
date: formatted print date
title: document title
url: document location
pageNumber: current page number
totalPages: total pages in the document
For example, <span class=title></span> would generate span containing the title.
footerTemplate string (optional)
HTML template for the print footer. Should use the same format as the headerTemplate.
RETURN OBJECT
data string
Base64-encoded pdf data.
The Chromium source code for Page.printToPDF shows us that:
The Page.printToPDF parameters are passed to the sendDevToolsMessage function, which issues a DevTools protocol command and returns a promise for the results.
After further digging, we can see that Chromium has a concrete implementation of a class called SkDocument that creates PDF files.
SkDocument comes from the Skia Graphics Library, which Chromium uses for PDF generation.
The Skia PDF Theory of Operation, in the PDF Objects and Document Structure section, states that:
Background: The PDF file format has a header, a set of objects and then a footer that contains a table of contents for all of the objects in the document (the cross-reference table). The table of contents lists the specific byte position for each object. The objects may have references to other objects and the ASCII size of those references is dependent on the object number assigned to the referenced object; therefore we can’t calculate the table of contents until the size of objects is known, which requires assignment of object numbers. The document uses SkWStream::bytesWritten() to query the offsets of each object and build the cross-reference table.
The document explains further down:
The PDF backend requires all indirect objects used in a PDF to be added to the SkPDFObjNumMap of the SkPDFDocument. The catalog is responsible for assigning object numbers and generating the table of contents required at the end of PDF files. In some sense, generating a PDF is a three step process. In the first step all the objects and references among them are created (mostly done by SkPDFDevice). In the second step, SkPDFObjNumMap assigns and remembers object numbers. Finally, in the third step, the header is printed, each object is printed, and then the table of contents and trailer are printed. SkPDFDocument takes care of collecting all the objects from the various SkPDFDevice instances, adding them to an SkPDFObjNumMap, iterating through the objects once to set their file positions, and iterating again to generate the final PDF.
Thanks to the other answer (https://stackoverflow.com/a/51460641/364131) and codesearch, I think I found most of the answers I was looking for.
The printing implementation is in PrintPageInternal. It uses two separate WebFrames — one to render the content, and one to render the header and footer. The rendering for the header and footer is done by creating a special frame, writing the contents of print_header_and_footer_template_page.html to this frame, calling the setup function with the options provided and then printing to a shared canvas. After this, the rest of the contents of the page are printed on the same canvas within the bounds defined by the margins.
Headers and footers are scaled by a fudge_factor which isn't applied to the rest of the content. There might be something funny going on here with the DPIs (which might explain the fudge_factor of 1.33333333f which is equal to 96/72).
I'm guessing this special frame is what prevents the header and footer from sharing the same resources (styles, fonts etc.) as the contents of the page. It probably isn't setup to load (and wait for) any additional resources requested by the header and footer templates, which is why the requested fonts don't load.
I do a lot of research on this issue and finally, I implement a small library to handle this issue by a small hack:
I create two PDF files. The first one is the HTML content without header and footer. And the second one is the header and footer repeated based upon original content PDF pages' number, then merges them together.
You can find it here:
https://github.com/PejmanNik/puppeteer-report

Objective-C: Create UITextViews and UIImageViews from HTML string

So I am fairly new to working with iOS applications and I am currently working on an application which pulls data from a website using NSURLSessionDataTask, parses the JSON response (HTML), and then populates a view with data from the response.
Currently, I am stuck trying to find a solution to correctly parsing my current HTML string (of type __NSCFString/NSString) from the JSON response, and putting the text into one or more UITextViews and images into one or more UIImageViews within the main ViewController.
It has previously been suggested to simply use a UIWebView to display everything or to use an outside library to do some of the converting, however I am curious if there are any methods by which I could parse the HTML and extract all relevant text or images and throw them into an array for later use. I would very much prefer to keep all of this native and not use outside libraries unless absolutely necessary.
I have also seen some use of the NSAttributedString class for parsing HTML from a JSON response, but am unsure if this is even relevant to what I am trying to do here. Any help/suggestions/thoughts are appreciated!
You can use the native NSXMLParser class. Checkout the documentation here.
Within your custom parsing class. You can generate NSAttributedString and store into array based on custom logic. This should help.

How can I import html content to pdf template?

I created a pdf template with open office draw. it has textboxes and I can set values with acrofield. But I can't import a html content to template.
I can convert html contents to pdf file; but for template, how can I do it?
My problem is with template; also my html content have to map on page, for example center of page.
Thanks
I am not quite sure if I understand your question, but it seems like you need some kind of template where you will enter your content.
My thinking goes to OpenXML as the best fit. But since it is rather complex you can save some time by using third party tools.
From my experience, Docentric gives you good value for the money. You can prepare a template in Word and then merge it with data from any source that can fit into .NET object. Your document can be converted to pdf or xps if required.
Templates are generated in MS Word (2007 or newer) using special Docentric Add-in for template generation. All MS Word formatting can be applied here. Placeholders for data are set where the data will appear at runtime.
The process is straight forward so even end users can design reports. Developers then focus on bringing data in from various sources (database, XML). Chech the product documentation for ideas how to use it.

Text heavy iOS App. Store text in HTML, Plist, or Other?

I'm writing relatively complex iOS app that is very text heavy.
The text is also heavily formatted. It has lots of color, size, font, and spacing changes, as well bulleted lists and other text features you'd expect to see in a very rich website.
The text is displayed on about 40 different views. Some of which display a lot of text, others a little. There is no one template that all the pages follow. (There are some that are similar, but that's not the point.)
Lastly, the text is constantly being changed and updated by an editorial team during development, not so much after release. The text has to be stored on the device, downloading files is not an option.
My question is, what is the best way to store and then render all this text in an iOS App?
My approach
Store all the text content and formatting info in an html file and use
[[NSAttributedString alloc] initWithFileURL:htmlDoc
options:#{
NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType}
documentAttributes:&attrDict
error:&error];
to create a NSAttributed string and use that to populate UITextViews.*
*Note: I would do some more work before creating the UITextViews. First I would parse it to find the appropriate page number [[Page:1.3]] and then parse the elements in that section [[header]], [[side_scroller]], etc...
I like this approach for two main reasons:
It created a separate copy document that contained all the text
and formatting info.
I'm the only iOS developer, but we have a couple front-end
developers. So when we get slammed with changes that need to be done
in 3.45 minutes, I could have some of the guys help me make the
changes, without having to know all the nuances of UIFont and
related classes. Occasionally, the editors could even make the
changes themselves :)
Minor reasons for liking this approach:
The text can vary so much per page, that creating a new UIFont + Plist entry to store the formatting info seems like a bigger pain than having everything in a .html document. (I could be wrong about this.)
Project managers will inevitably say: "Make this word a little bigger," "This word looks strange, add italics," and "Make everything purple!" HTML/CSS seems like a more flexible solution for quickly implementing these requests.
Downsides of this approach:
NSAttributedString picks up 99% of the HTML attributes I threw at it. It did not pick bullet spacing changes in unordered lists <ul>.
Plists are more performant.
Here are some other approaches I considered:
Plist + UIFont
RTF Document - Originally started with this, but found it hid a lot of what was going on and NSAttributedString wouldn't pick up some of the changes.
XML
Any advice or input would very appreciated.
Notes:
iPad app,
iOS 7,
No Internet Connectivity,
Xcode 5
What I did to store styled text in an iOS app was to write a Mac OS command line tool that opens RTF files and converts them to attributed strings (It's a 1-line call in Mac OS, but not supported in iOS for some reason.) I then use NSCoding to save the attributed strings as binary data, with a special .DATA filetype.
I created a custom UITextView category with a method that knows how to load the text view's attributed text from my custom filetype.
I created a build rule in my project that treats RTF files as source files in a build step and the .DATA filetype as the output, and copies the .DATA files into the build project.
Now, all I have to do is add an RTF file to my project the build process inserts the .DATA version of the styled text into the executable.
The Xcode editor knows how to edit RTF files, so you can edit them right in place in the IDE, OR you can edit them in TextEdit or any editor that supports RTF files.
There are a few things you can put in an RTF that aren't supported in UITextViews. (I don't remember what those are offhand. Sorry.)
I find styled WYSIWYG text much easier to deal with than HTML. You just edit the text, and the build process picks up the changes.
It worked beautifully. Plus, binary NSCoding output is a whole lot more compact than HTML.
I would recommend using web view. It can open files in resource bundle.
You can disable all the links in HTML by implementing delegate method shouldStartLoadWithRequest to return NO.
You might also want to set dataDetectorTypes to UIDataDetectorTypeNone.
That will disable auto link detection in web view

design a java webapp that prints html signs and lables

I want to design a webapp that can print signs for various products, such as a big store.
The content of the signs (product names , descriptions ,prices ) comes from the server and changes daily. Each product can be printed to a A3 or A4 document.
It is also possible to have 3 signs in one A4 page.
In addition each product type has a differently desinged sign (Tv's have the price on the top of the page in RED, and printers have the price on bottom left in BOLD)
the idea is that the program will get the product data from the DB, push it in inside a html template according to the page size and product type and print the html (or convert the html to pdf and print)
some problems I faced so far:
- textfields from the DB can be to long, and overlap an area with other texts or scramble the rest of the sign.
- there are many product types and each one has its own html design and css so its very hard to maintain if i need to change things .
- different browsers show the sign differently .
- different printers print the sign differently.
What would be the best way to approach the problem? could css frameworks help?
I'm open for ideas.
I've developed an app that does printing, and HTML layout is about the furthest direction from the path that I would take. HTML printing loses elements such as background, positioning, etc very randomly....and it depends by printer brand and driver. If you're serious about going this route, the only two paths I'd consider doing are Postscript or Adobe PDF. HTML can be a valid "preview" but there again you will be fighting against the discrepancies between how the browsers render your code to the screen--no two are the same. Best still to do a .pdf and just display it.
On my app, I do general layout snapped to a draggable grid in Javascript, then output the coordinates and elements to a database that my (very specialized) printer picks up via an automated text document FTP and reassembles using a proprietary print server. From there, the print server puts all the elements together, positions via grid and outputs the job. It's been months in the making and a huge pain to build, but the outcome is just what my company needed for custom printing on demand. We train all our users to understand that layout is not guaranteed perfect like inDesign or Quark, and even then we get occasional complaints. Bottom line--the web wasn't made to be a print layout tool!
use xml + xslt serverbased transformation .
Keep data in standard XML (put that xml in DB)
Keep style in XSLT(Select XSLT depending on product company)
This could be pretty complex but you can apply style templates in form of xslt .
Most browser support this if you do it on server side and stream it .
If you want PDF , HTML ,word docs to be generated then just write XSLFO and use apache xalan framework to create them