Converting Webpage to PDF - html

I have a project and the old programmer thought converting a webpage to PDF would be easy using web-based conversion software. I'm not so sure since it requires headers/footers and it's a listings page, so it will need to know when to & when not to page break, or else it will start new pages halfway through an item on the list. I've also had problems with it cutting up images between two pages.
I've tried convincing the client that the requirements are too much and we need to create the PDF using PHP, but they are convinced building a page in HTML and converting it will work.
So I want to know if there are any web-based conversion software out there that supports converting HTML, with headers / footers and ability to tell it when to and not to page break.
Thanks.

There's plenty of Saas services out there. Here's another Saas one I highly recommend.
It's htm2pdf.co.uk and they have both a PDF API (that works with http GET and supports all platforms) as well as a HTML to PDF SDK (that works with http POST and is only available in PHP).
It is based on webkit and therefore supports anything webkit does. Webkit is what browsers like Safari & Chrome are based on. It supports headers / footers / page breaking and what not, but also additional PDF features like encryption and watermarking.

I work at Expected Behavior, and we have a product called DocRaptor that converts HTML code to PDF with an HTTP POST request. DocRaptor can definitely handle headers, footers and page breaks. DocRaptor is a SaaS application, and every plan has a 30-day trial.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our coding examples:
DocRaptor coding examples

Related

how migrate from office documents to modern web technologies based documents - advice welcome

Currently, all documentation is based on MS office. This makes it quite challenging if you want to integrate some functionality. Then you have either the option to go with VBA or VSTO. First is not that comfortable, second could be like taking a sledgehammer to crack a nut.
Simple things like simple controls, hiding text or basic maths can be easily realized by HTML.
So I would need an HTML text processor what focus on content (text) and allow me to add interactivity when I need it. That means switching to source code or showing additional panels only on request so that the author can focus on the textual content (A more programming familiar person would do the formatting/interactivity).
In the long term, I want to have the ability to integrate things like SQLite and API calls.
In addition, the output has to be in a single file otherwise it isn't portable in a practical way and users (who only fill in data) won't accept it.
I conduct some research and figured there isn't an all-in-one solution instead there are several options to meet some of my requirements.
I wonder which is best to realize my long-term goals.
HTML5 offline app
It looks like that I either develop an offline HTML5 app what is well explained well, e.g. here:
Offline web applications: a working example
Tutorial: How to make an offline HTML5 web app, FT style
Creating HTML5 Offline Web Applications
How to Build an Offline Single-page Website
plus some background information on Single page apps in depth
markdown
The content could be generated in a markdown editor as recommended in What's a good, auto-saving, WYSIWYG HTML word processor? or just simple convert office documents to html5.
HTML editor / site designer
Alternatively, I could use an HTML editor or and visual site designer
but the selection isn't exactly a small one for me to choose from.
I found some help in:
Battle of the Text Editors: Atom, Sublime & Brackets
26 Tools and Frameworks for HTML-based Desktop and Web App Interfaces
Free HTML Editors: The 16 Best for Web Developers on Windows
14 Best Free HTML Editors
or even simpler for the standard office user: GrapesJS - Next generation tool for building templates without coding
single file website (app)
In my understanding there is still the problem that the output won't be a single file, will it?
I could make use of the archive formats but there aren't supported by all applications as well explained on What's the best “file format” for saving complete web pages (images, etc.) in a single archive?.
That could work if I do only simple things and integrated media object as Base64 encoded objects but it comes with the disadvantage of large overhead.
Desktop web apps
If I want to realize more complex things I would need to develop desktop apps using HTML/CSS/JavaScript but once again the selection is little as you can see in How to develop Desktop Apps using HTML/CSS/JavaScript? [closed].
I haven't found anything yet telling me if I can deploy a desktop web app as a single file.
A4 layout
The layout would be realized on the information given in How to make an HTML Page in A4 paper size page(s)?
office to HTML / html word processor
XSweet - The open .docx to HTML conversion tool
Wax / Wax II (web-based word processor)
HTML to office
For the transition phase, it would be nice to be able to transfer HTML to e.g. docx. Some options are mentioned in the references below:
How to convert HTML file to word?
Convert html to docx using pandoc
html-docx-js
Convert Html to Docx in c#
So how shall I proceed?

Approach to building a GUI for a web application

First, a short disclaimer. I have next to no knowledge about web applications. I come from an iOS background where I exclusively wrote native code, so if you write your answers like I know nothing outside the shallow parts, that would be great.
I'm interested in learning a stack to develop web applications, but I'm not sure what the right way to build the GUI is. I know that a web front end consists of html and CSS to create the display and javascript as the bridge between the back end and the GUI, but I don't know the best way to put something together.
I know in iOS, you can use the Interface Builder (part of xcode that lets you graphically create the xml that describes the display) to create GUI's without any knowledge of how iOS translates the xml to some rendering, or even what is written in your xml files. Is there any analog to the web front end?
I'm mainly just looking for a list of the accepted ways to develop the GUI for a web application. If I have to learn HTML and CSS, so be it, but I'd like to know what my options are and the tradeoffs between each of them.
I can answer shortly stating that (technically) you can design web pages without coding in HTML or CSS, or even Javascript - although, you would be somewhat limited in your creative abilities and applications.
You can read about WYSIWYG html editors on this link, or try out ckeditor (someone said it's good)...
...I think a bit of background will help you reach a correct decision...
so here goes:
The Long Answer
I would start by trying to put the world of web programming and design into concepts that correlate with iOS coding.
If we look at the whole of a web app from an MVC perspective, then the browser is the view, the server is the controller and the database is the model... although this is very simplified.
Just like in iOS, each of these can be (but doesn't have to be) broken down into sub-MVC systems.
Just like any model in MVC, the view (the browser) can talk to the model (the database), but really it shouldn't. that's just bad practice.
If we break down the main-view (the browser) to a sub-MVC system, I would consider the HTML as the model, the CSS as the view and the browser (through links and javascript) as the controller.
It's not all that clear cut, but thinking like this helps me practice better and cleaner coding.
The HTML is the view's model for the web-app - it contains the data to be displayed or used.
HTML is a variation on the XML format and it contains data organized in a similar way to an XML file.
The basic HTML file will contain:
<html>
<head>
</head>
<body>
</body>
</html>
this should look familiar to you if you read any XML.
The CSS (cascading style sheets) is the view - it states HOW the html DATA should be displayed.
if your web app does't have any CSS, it will use the browser's default CSS/styling to be applied on the data in the HTML.
This "language" makes me think more about dictionaries in iOS (I think that's what they're called in Objective C). They have properties and values (like key-value pairs) that determine how the HTML data is displayed (if it's displayed).
They could look something like this:
body
{
color: white;
background-color: black
}
The browser is the web-app's view's controller - it makes it all work together and serves it up to the screen.
Javascript and links help us tell the controller what we want it to do, but it is the browser that acts (and willfully at times).
You can have a whole web app that acts without javascript, using only the default actions offered by links - in which case the browser will usually ask the server (the main controller) what to do.
Javascript helps us move some of the legwork from the server to the client, by allowing us to have a "smarter" controller for our view - just like in iOS.
The issue of the errant main-view / browser
Not all views are created equal, and not all browsers are the same.
Because the browser is used as the controller for the web-app's view, and because some browsers act differently then others, we web coders have the problem of working around someone else's idea of how our view's controller should behave.
You might see us complaining about it quite a lot (especially complaining about Internet Explorer).
These days, this issue is not as big of a problem as it used to be... it's just that some people don't update their computers...
WYSIWYG web editors
There are website builders and editors that try to work like X-Code does, by allowing us to build the website much like we would write word documents.
But, unlike X-Code which codes only the graphical interface of the view, these website builders write the model as well and usually add javascript into the mix.
When we use these tools (which I avoid), the whole MVC model breaks apart.
We can use them as a starting point for dirty work, but then we take their code apart and adjust it to our needs - usually by taking the code we need for the view (CSS) and applying it where we need it (and discarding much of the nonsense they add to the code and the HTML).
To summarize
As you can tell, HTML and CSS (and Javascript) are only a small part of a web app - as they all relate to the main view of the web app.
To write the controller and models for web apps, we use other tools (such as Ruby, PHP, js.node, MySQL and the like).
Coding the HTML and CSS isn't as hard as you might think, although it might be harder then I present it to be.
You can avoid writing code for web apps and use applications that offer WYSIWYG (What You See Is What You Get), but that would limit your abilities and will take away from your control over what you want to create.

ExtJs and Sencha Touch Search Engine Optimization

I've started learning ExtJS 4 and Sencha Touch 2, and i really like it.
The main difference between Sencha products and jQuery(& others) is that instead of enhancing preexisting HTML, it generates its own DOM based on objects created in JavaScript.
Apps developed like this are great as intranet apps, but can you create a consumer oriented website using Sencha?(like an online store)
I see that you don't write any HTML code in ExtJS or Sencha Touch so i am wondering how can fully generated Javascript page be indexed by Search Engines like Google. As i know, the Google Bot only sees the plain HTML code.
Is there anyway to SEO a Sencha WebApp?
Kind Regards,
Dan Cearnau
Nothing is impossible. You just need to do some work.
1. Generate standard static page using PHP or smth else. The page should look like the page of your ExtJS app. But all links must have GET params in URL. Also PHP should aggregate input GET params.
2. Add your ExtJS app to the page. In the app you have to take into an account GET params and make proper request.
2a. If a real user opens your page: PHP generates the output, then ExtJS app starts and hides the static page and generates the dynamic output.
2b. If a crawler opens your page so JS is disabled, PHP aggregates the request according to GET params and generates the output.
You can add params to URL like #param1&param2&param3 in ExtJS when clicking on links, so real users will be able to share their links. Just learn the router on PHP-side to understand URLs like this.
There is no way to make SEO-friendly pages using JavaScript only.
Using a full blown app it would be close to impossible to SEO. They are far too dynamic. Search engines work of indexing pages. They can deal with some Ajax stuff by supporting pages with #s but imagine how many pages a fully functional app will have. Every view you have has 100s of options that would constitute a new page, which also has 100s of options. All these virtual pages would most likely be just slight variations from other pages. different sort order, different filter, moved panel, search option.
If you use ExtJs to enhance a website like jQuery is often used, then that's a different story. You will have html for the spiders to read and then you enhance how the content works via javascript (see progressive enhancement).
Actually in Touch 2 you can define paths and use history support. This will treat sections of your app as actual pages in the browser w/ standard functionality like going back in the browser etc... this will be your best bet when working with mobile SEO
Getting any kind of SEO out of a Sencha app is impossible since it builds everything on the fly. Even if you use the history support in Sencha Touch, thats also done on the fly and has no effect on SEO.
For consumer-facing websites, Sencha is not the answer. For back-end (for maybe managing the shopping cart), its a different story.

Converting XML to HTML

I have made a 'RSS Feed' app through parsing XML. Now I want to load the content into a UIWebView within a detail view, but not as a generic browser. I know that HTML content can be loaded in a UIWebView, so I want to convert the XML feed's content to HTML content and load it in the web view.
How could this be done?
Generally, you can convert XML to XHTML using XSLT.
Unfortunately, as the following SO questions show, using the existing libxslt (and having a solution fully on the client) is not allowed on iOS for some reason. Or at least, has not been allowed in the past -- no idea if this is still true for iOS 4.3+.
Version of XSLT in iPhone
Alternative to NSXMLDocument on the iPhone for XSLT purposes
How do I include libxslt in my iPhone app?
No XSLTProcessor() support in Safari?
You may need to implement a server-side solution. (Which might not necessarily be a bad thing, since doing the work server-side means faster clients)
The RSS should have a "link" tag for every feed, which (link) holds the URL of the specific news tag. You can just load the link on a uiWebView.

How can I save a webpage as an image in my rails app?

In my rails app I have a need to save some webpages and display them to the user as images. For example, how would I save www.google.com as an image?
There is a command line utility called CutyCapt that is using the WebKit-Rendering engine to render HTML-Pages into various image formats. Maybe this is for you?
http://cutycapt.sourceforge.net/
Prohibitively difficult to do in pure Ruby, so you'd want to use an external service for this. Browsershots does it, for example, and it looks like they have an api, although I haven't used it myself. Maybe someone else can chime in with alternative but similar services.
You'll also want to read up on delayed_job or something similar, to make sure you're accessing those page images as a background task and that it doesn't interfere with your actual application.
You can't do it easily (probably can't do it at all).
Each page is just a text - html data. The view you want to make an image of is a rendered page. Browser renders the page using tonns of techniques like html parsing, javascript parsing, css parsing, font rendering, etc.. To make the screenshot of google page - you would need to do all the rendering somewhere in memory and then take a screenshot of rendered page.
That task is almost impossible (there is nothing fully impossible).
If you are really eager to donate tonns of time to accomplish that task - you should do this steps:
1) Find some opensource rendering engine. Firefox would do.
2) Find some way to communicate between ruby-on-rails and that engine.
3) Wire it all together and see the results.
However, I see steps 1 and 2 as nearly impossible.
Firefox addon:
https://addons.mozilla.org/en-US/firefox/addon/1146/