Getting the web page content, similar to Readability as service

Getting the web page content, similar to Readability as service - html

I'm looking for some facility for getting out clean HTML content for different pages (blog articles, magazines etc). The basic idea is how the 'Reader' in iOS Safary works.
This answer I can up that iOS Safary uses Readability for content parsing. Unfortunatelly the API does not include any methods for parsing, instead saving a bookmark and getting it's content, which does not suit me much.
Another answer here suggests to use https://www.readability.com/api/content/v1/parser but it does not work for me.
Any suggestions for similar services?

Have a look at Tranquility. It is a Firefox Add-on so you can look at the source. You can download the XPI and unpack it. Then look into content/tranquility.js and the related files in content/.

Related

Fetch Web HTML data in tvOS

I have an iOS app where I read the data from the website in a UIWebView which is hidden from the user (don't worry its my own website), parse the data from the resulting HTML, read specific info and put it into NSArrays to display in UITable. All is well and good and works in iOS.
- (void)webViewDidFinishLoad:(UIWebView *)webView2
{
NSLog(#"webViewDidFinishLoad ...");
NSString *htmlSourceCodeStr = [webView2 stringByEvaluatingJavaScriptFromString:#"document.documentElement.outerHTML"];
}
I was thinking of porting the same app to tvOS. Surprise Surprise UIWebView isn't available in tvOS.
Is there a way for me to load / Fetch website data somehow in tvOS, parse and read the resulting HTML? Is that even possible?

Just make the HTTP request directly using NSURLSession. You'll get the HTML back as NSData and can parse it from there just like before. You should probably do the same in your iOS app: if you're not going to show the HTML to the user then there's no point in using something as heavyweight as a web view.

To answer your question, no (kind of), it is not possible at the moment. The link you posted pretty much answered your question - there may be a way using private APIs or using UIWebViews (maybe), but Apple would never allow it to be published. Not only that, but WebKit isn't available either. This pretty much sums it up:
Companies hoping to leverage a universal HTML5/CSS/JS-based UI for multiple platforms may no longer find this option viable. They will need to rethink their strategy for providing a cohesive branded experience on multiple platforms while still writing native apps for those platforms. That’s not to say that someone wouldn’t be able to try compiling WebKit and creating their own view within to render HTML. Certainly there are companies with the manpower and resources to do this. I’m not certain those apps wouldn’t get rejected, though.
Very sorry I couldn't give you the answer you were looking for. Good luck :)
Sources
https://developer.apple.com/library/tvos/releasenotes/General/tvOS90APIDiffs/index.html
https://medium.com/bpxl-craft/apple-tv-a-world-without-webkit-5c428a64a6dd#.s2h3xvsx6
Web app in tvOS
http://www.idownloadblog.com/2015/11/06/apple-tv-browser-hack/

What are dpuf (extension) files?

I have seen this extension in some urls and I would like to know what they are used for.
It seems odd, but I couldn't find any information about them. I think they are specific for some plug-in.

It seems to be connected to 'Share This'-buttons on the websites.
I found this page which gives a quite comprehensive explanation:
This tag is mainly developed for tracking the URL sharing on various Social Networks, so every time anyone copies your blog content there he gets the URL ending with #sthash and extension with .dpuf or .dpbs

Converting Webpage to PDF

I have a project and the old programmer thought converting a webpage to PDF would be easy using web-based conversion software. I'm not so sure since it requires headers/footers and it's a listings page, so it will need to know when to & when not to page break, or else it will start new pages halfway through an item on the list. I've also had problems with it cutting up images between two pages.
I've tried convincing the client that the requirements are too much and we need to create the PDF using PHP, but they are convinced building a page in HTML and converting it will work.
So I want to know if there are any web-based conversion software out there that supports converting HTML, with headers / footers and ability to tell it when to and not to page break.
Thanks.

There's plenty of Saas services out there. Here's another Saas one I highly recommend.
It's htm2pdf.co.uk and they have both a PDF API (that works with http GET and supports all platforms) as well as a HTML to PDF SDK (that works with http POST and is only available in PHP).
It is based on webkit and therefore supports anything webkit does. Webkit is what browsers like Safari & Chrome are based on. It supports headers / footers / page breaking and what not, but also additional PDF features like encryption and watermarking.

I work at Expected Behavior, and we have a product called DocRaptor that converts HTML code to PDF with an HTTP POST request. DocRaptor can definitely handle headers, footers and page breaks. DocRaptor is a SaaS application, and every plan has a 30-day trial.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our coding examples:
DocRaptor coding examples

Hiding Chrome bookmark text via extension

I'm trying to get a start in programming by writing a Chrome extension similar to the Smart Bookmarks Bar extension for Firefox. Java seems straightforward enough, and I can probably figure out the specifics of building an extension but I can't find out what commands I need to change the rendering of the bookmarks.
1)Does anyone know where I could find the relevant documentation?
2)Does anyone know of extensions that interact with bookmark rendering I could take a look at the source code of?

Everything you can do with the bookmarks is listed in the API:
http://code.google.com/chrome/extensions/dev/bookmarks.html
(and as someone said here on SO: java is related to javascript as a car is related to a carpet :] )

How do I create "accessible" PDFs from HTML?

Does anyone have any suggestions on how to generate accessible PDFs (including images) from HTML?
The PDFs need to look like the original HTML, including positions of images etc.
Any special HTML structure required to help make the final PDF accessible?
I've seen questions about creating PDFS none of them specifically address the important issue of accessibility.
My poison of choice is Perl but references to any program, language or library will help.
I have a more in-depth question at TypeDoc if anyone has more general information to offer.
http://doctype.com/TiB
Also,
I, and others, would find it useful if users with accessibility problems could comment if they find the "usability experience" of using PDFs better or worse than reading from Plain Old Semantic HTML (POSH).
Thanks
Mike

Look into PrinceXML. Through CSS you can control margins, page breaking and orientation. While not open source, you can try it for free, but it places a small water mark in the upper right corner.

The Adobe ColdFusion server product does a really fine job of this, not surprisingly. But it's not free, and the open source implementations of the language (Smith and BlueDragon) don't support the pdf stuff.
Developer licenses to Adobe ColdFusion are free, and you can download it.

I've done this thing on a small scale but scripting Safari to print to PDFs. I don't recommend it for large-scale projects though.

By far the most capable PDF publishing tool I've ever come across is reportlab. There is an open source library written with Python and a proprietary system that allows you to construct a document using RML, a custom xml spec. The latter is easier for more complex docs. They tend to be very flexible (and reasonable) with pricing.
Not strictly an answer to your question as it doesn't handle html-to-pdf conversions, but perhaps of use to you.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Getting the web page content, similar to Readability as service - html

Have a look at Tranquility. It is a Firefox Add-on so you can look at the source. You can download the XPI and unpack it. Then look into content/tranquility.js and the related files in content/.

Related

Fetch Web HTML data in tvOS

What are dpuf (extension) files?

Converting Webpage to PDF

Hiding Chrome bookmark text via extension

How do I create "accessible" PDFs from HTML?

Categories

Resources