I am trying to implement a c# idml to html converter. I've managed to produce a single flat html file similar to the one produced by the indesign export.
What I would like to do is to produce html that will be as similar as possible to the indesign view like an html idml viewer. To do this, I need to find the text that can fit into a textframe, I can extract the story text content but I can't really find a way to split this content into frames/pages.
Is there any way I can achieve that?
Just extracting the text from a story isn't enough. The way the text is laid out is controlled by TextFrames in the Spread documents. Each TextFrame has a ParentStory attribute, showing which story it loads text from, and each frame has dimensions which determine the layout. For unthreaded text frames (ie. one story <> one frame), that's all you need.
For threaded frames, you need to use the PreviousTextFrame and NextTextFrame attributes to create the chain. There is nothing in the IDML to tell you how much text fits in each frame in a threaded chain, you need to do the calculation yourself based on the calculated text dimensions (or using brute force trial and error).
You can find the spreads in the main designmap.xml:
<idPkg:Spread src="Spreads/Spread_udd.xml" />
And the spread will contain one or more TextFrame nodes:
<Spread Self="udd" ...>
<TextFrame Self="uf7" ParentStory="ue5" PreviousTextFrame="n" NextTextFrame="n" ContentType="TextType">...</>
...
</Spread>
Which will in turn link to a specific story:
<Story Self="ue5" AppliedTOCStyle="n" TrackChanges="false" StoryTitle="$ID/" AppliedNamedGrid="n">...</>
(In this example the frames are not threaded, hence the 'n' values.
All this is in the IDML documentation, which you can find with the other InDesign developer docs here: http://www.adobe.com/devnet/indesign/documentation.html
Microsoft and Adobe have proposed a new module for css named Regions which allow you to do flow tekst into multiple containers. Keep in mind that you will never be able to create an html page that looks exactly like an Indesign document.
http://www.w3.org/TR/css3-regions/
For now only IE10 and webkit nightly support it: http://caniuse.com/#feat=css-regions
Related
I am trying to automate a workflow for automatically creating HTML newsletters based on information stored in a spreadsheet.
Currently, I am using a newsletter drag and drop tool, in which several pre-programmed blocks are available (e.g. full column block, 2 column block etc). When creating a newsletter, I drag and drop a block and fill in my content (e.g. uploading an image, inserting a url). This is all well and good, however, since I have to create the same newsletter in 10 different languages, this process is quiet time consuming and prone to human error. While all newsletters are the same in terms of layout, the images and urls differ.
To solve this issue, I would like to get rid of the drag and drop process, and instead automate the workflow in some other way.
One idea that I have already tried, but that doesn't seem like the perfect option to me, is to dynamically create the needed HTMLs in Excel. Basically, the idea is to take the existing block template structure, and put it into Excel with some formulas.
I could then copy and paste the links to the images (in a simple format, such as EN1.jpg, ES1.jpg, etc.), as well as to urls (url.com, url.es).
This is some example block:
<img alt="" align="center" width="700" style="max-width:700px;" class="resetWidth" border="0" src="IMAGE" />
My final expected result is something like this:
I define the layout in a very quick manner (e.g. writing fullcolumn, half column, fullcolumn). The corresponding code is taken from the template. I then provide the attributes (image url, link url) in the form of a list or so. The end result should then be 10 html files that I simply have to upload to the newsletter software.
I would appreciate it very much if anyone had any ideas on this.
Another option for translating the page is to do something like this https://www.w3schools.com/howto/howto_google_translate.asp
it adds a selection for languages to translate into.
As for automating the images, you could set up folders for each langauge and reuse the name of images based on where you want them so they would be placed in the correct location.
All you'll have to do it replace the images with the same file names and swap the default language on the Google Translator.
So something like this that the html will stay the same with regards to the image names
For the link variables you may be able to write some JS or another language to take advantage of the
<html lang="">
and based on which lang is set, insert a set of links to the file.
When extracting data you can use CSS/xpaths. But is there a similar or reliable method of doing this in the page source.
www.amazon.com/Best-Sellers-Electronics-Televisions/zgbs/electronics/172659
You could get the page source and then parse using Regex but probably not be reliable if for instance the tv did not load on the page. I have looked up various solutions but I have yet to find one that mentions getting every tv at start of each line (1, 4, 7 etc,, in source) or using a reliable method e.g Css/xpaths in source of a page.
What would is the golden standard of reliable method of doing what I am after?
To get the page source you can use CURL if the page is rendered entirely on server side (most pages won't be), or headless chrome to get the actual DOM that will render in the browser (https://developers.google.com/web/updates/2017/04/headless-chrome).
For scraping the content, I've used cheerio (https://github.com/cheeriojs/cheerio) which will allow you to read in HTML to an object and then scrape your data off that using jQuery expressions. (Headless chrome allows you to execute JS on the pages you visit, so you don't necessarily need cheerio).
In your specific example you could get the TV on each line by combining the right class selectors to get the divs containing TV's, and using attribute selector with 'margin-left=0px' which would get first item on each line. That is obviously very much bound to structure of the page and will likely be broken by smallest of changes in the page source. (And not really any different from using xpaths. Still better than regex though)
With certain elements loading / not loading on the page (if that was what you meant by TV not being there), no golden solutions that I know of, except allowing sufficient time for the page to load and handling your scraper failing gracefully.
I'd like to achieve the following and I'm looking for ideas. I have a document and I want to represent/transform this content in/to a nice SAPUI5 framework. My idea is the following: a split app with having the paragraph titles in the master view (plus a search function on top) and the respective content in the detail view.
I'd like to know from you if
a) you might want to share your ideas and hints on alternatives.
b) this can be achieved within one single file (i.e. all the code for the split app and document content in one html) and maybe using pure html code (xml also feasible) - against the background of easily handing a large amount of text available in html.
c) if you happen to have/know a reusable template.
Thanks in advance!
An interesting question. I went through a similar exercise once, re-presenting my site with UI5.
To your questions:
(a) I would think that the approach you suggest is a good one
(b) You can indeed include all the app in a single file, I do that often by using script templates, even with XML Views. You can see some examples in my sapui5bin repository, in particular in the SinglePageExamples folder. Have a look at this html file for example: https://github.com/qmacro/sapui5bin/blob/master/SinglePageExamples/SAP-Inside-Track-Sheffield-2014/end.html
What I would suggest is, rather than intermingle the document content and the app & view definitions, maintain the content of your document separately, for example, in XML or JSON, and use a client side model to load it in and bind the parts to the right places.
I'm writing relatively complex iOS app that is very text heavy.
The text is also heavily formatted. It has lots of color, size, font, and spacing changes, as well bulleted lists and other text features you'd expect to see in a very rich website.
The text is displayed on about 40 different views. Some of which display a lot of text, others a little. There is no one template that all the pages follow. (There are some that are similar, but that's not the point.)
Lastly, the text is constantly being changed and updated by an editorial team during development, not so much after release. The text has to be stored on the device, downloading files is not an option.
My question is, what is the best way to store and then render all this text in an iOS App?
My approach
Store all the text content and formatting info in an html file and use
[[NSAttributedString alloc] initWithFileURL:htmlDoc
options:#{
NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType}
documentAttributes:&attrDict
error:&error];
to create a NSAttributed string and use that to populate UITextViews.*
*Note: I would do some more work before creating the UITextViews. First I would parse it to find the appropriate page number [[Page:1.3]] and then parse the elements in that section [[header]], [[side_scroller]], etc...
I like this approach for two main reasons:
It created a separate copy document that contained all the text
and formatting info.
I'm the only iOS developer, but we have a couple front-end
developers. So when we get slammed with changes that need to be done
in 3.45 minutes, I could have some of the guys help me make the
changes, without having to know all the nuances of UIFont and
related classes. Occasionally, the editors could even make the
changes themselves :)
Minor reasons for liking this approach:
The text can vary so much per page, that creating a new UIFont + Plist entry to store the formatting info seems like a bigger pain than having everything in a .html document. (I could be wrong about this.)
Project managers will inevitably say: "Make this word a little bigger," "This word looks strange, add italics," and "Make everything purple!" HTML/CSS seems like a more flexible solution for quickly implementing these requests.
Downsides of this approach:
NSAttributedString picks up 99% of the HTML attributes I threw at it. It did not pick bullet spacing changes in unordered lists <ul>.
Plists are more performant.
Here are some other approaches I considered:
Plist + UIFont
RTF Document - Originally started with this, but found it hid a lot of what was going on and NSAttributedString wouldn't pick up some of the changes.
XML
Any advice or input would very appreciated.
Notes:
iPad app,
iOS 7,
No Internet Connectivity,
Xcode 5
What I did to store styled text in an iOS app was to write a Mac OS command line tool that opens RTF files and converts them to attributed strings (It's a 1-line call in Mac OS, but not supported in iOS for some reason.) I then use NSCoding to save the attributed strings as binary data, with a special .DATA filetype.
I created a custom UITextView category with a method that knows how to load the text view's attributed text from my custom filetype.
I created a build rule in my project that treats RTF files as source files in a build step and the .DATA filetype as the output, and copies the .DATA files into the build project.
Now, all I have to do is add an RTF file to my project the build process inserts the .DATA version of the styled text into the executable.
The Xcode editor knows how to edit RTF files, so you can edit them right in place in the IDE, OR you can edit them in TextEdit or any editor that supports RTF files.
There are a few things you can put in an RTF that aren't supported in UITextViews. (I don't remember what those are offhand. Sorry.)
I find styled WYSIWYG text much easier to deal with than HTML. You just edit the text, and the build process picks up the changes.
It worked beautifully. Plus, binary NSCoding output is a whole lot more compact than HTML.
I would recommend using web view. It can open files in resource bundle.
You can disable all the links in HTML by implementing delegate method shouldStartLoadWithRequest to return NO.
You might also want to set dataDetectorTypes to UIDataDetectorTypeNone.
That will disable auto link detection in web view
I want to design a webapp that can print signs for various products, such as a big store.
The content of the signs (product names , descriptions ,prices ) comes from the server and changes daily. Each product can be printed to a A3 or A4 document.
It is also possible to have 3 signs in one A4 page.
In addition each product type has a differently desinged sign (Tv's have the price on the top of the page in RED, and printers have the price on bottom left in BOLD)
the idea is that the program will get the product data from the DB, push it in inside a html template according to the page size and product type and print the html (or convert the html to pdf and print)
some problems I faced so far:
- textfields from the DB can be to long, and overlap an area with other texts or scramble the rest of the sign.
- there are many product types and each one has its own html design and css so its very hard to maintain if i need to change things .
- different browsers show the sign differently .
- different printers print the sign differently.
What would be the best way to approach the problem? could css frameworks help?
I'm open for ideas.
I've developed an app that does printing, and HTML layout is about the furthest direction from the path that I would take. HTML printing loses elements such as background, positioning, etc very randomly....and it depends by printer brand and driver. If you're serious about going this route, the only two paths I'd consider doing are Postscript or Adobe PDF. HTML can be a valid "preview" but there again you will be fighting against the discrepancies between how the browsers render your code to the screen--no two are the same. Best still to do a .pdf and just display it.
On my app, I do general layout snapped to a draggable grid in Javascript, then output the coordinates and elements to a database that my (very specialized) printer picks up via an automated text document FTP and reassembles using a proprietary print server. From there, the print server puts all the elements together, positions via grid and outputs the job. It's been months in the making and a huge pain to build, but the outcome is just what my company needed for custom printing on demand. We train all our users to understand that layout is not guaranteed perfect like inDesign or Quark, and even then we get occasional complaints. Bottom line--the web wasn't made to be a print layout tool!
use xml + xslt serverbased transformation .
Keep data in standard XML (put that xml in DB)
Keep style in XSLT(Select XSLT depending on product company)
This could be pretty complex but you can apply style templates in form of xslt .
Most browser support this if you do it on server side and stream it .
If you want PDF , HTML ,word docs to be generated then just write XSLFO and use apache xalan framework to create them