Display html articles in a easy to read format - html

I have looked at the readability api which is useful to display data in a clean format on a html webpage. I am passing a Url to http://www.readability.com/read?url= to display the data. I am initially directed to a page where I can choose to view the info using readability is there any way I can directly view the content in a neat fashion without going through the actual re-direct?

take a look at Readability's API: http://www.readability.com/developers/api
Before you implement your code, you have to create an API Key on their website.

Related

Send and receive data to and from a website using the TWebbrowser component in Delphi

I'm creating a VCL Application with Delpi 10.3 and want to support some web functionality by having the user enter the ISBN of a book into a TEdit component and from there passing/sending this value to a search field on this website: https://isbnsearch.org after which the website looks up the ISBN and displays the Author of the book. I want to somehow access the information (i.e Author) presented by the search result and again use it in my application.
This is my GUI, for a better idea of what I want to accomplish:
What code can I use for this? Any other feasible suggestions or approaches are acceptable.
When performing a search on that website, it simply loads a page with a specific URL query string...
https://isbnsearch.org/search?s=suess
The above example is when I search for "suess", so you can easily concatenate a search URL.
You can use any HTTP component, such as TIdHTTP, to load this search page, then use an HTML parser to scrape the page and read what you need. Much, much easier than trying to read through the TWebBrowser.
In the end, you won't actually display the HTML (I mean you can if you want to), but the idea is to read the data and display it in your own format.
On that specific page, start by locating the ul element with id searchresults. Then, each li element contains individual results. Unfortunately, this website uses pagination, and only shows 10 results per page. To do this, call this page again with another parameter &p=2 for the 2nd page, &p=3 for the 3rd page, and so on.
On the other hand, that is the worst way to acquire such information. What you should be doing is using a proper API which gives you machine-friendly data. The service you are referencing doesn't appear to have an option, but here's an example of one which does:
https://openlibrary.org/dev/docs/api/books - this also appears to provide you MUCH more information than the one you're using.

Extract content from Wikipedia to Mediawiki

Is there a way to get the intro content from wikipedia page to my mediawiki page? I was thinking of using wikipedia's api but i dont know how to parse the url on my page and also with templates. I just want a query that will display the introduction part of a wikipedia page on my page?d
I used the External_Data Extension and Wikipedia's api to achieve this.
The API
http://en.wikipedia.org/w/api.php? action=query&prop=extracts&format=json&exintro=&titles=[title of wikipedia page]
How I used it
{{#get_web_data:
url=http://en.wikipedia.org/w/api.php? action=query&prop=extracts&format=json&exintro=&titles={{PAGENAME}}
|format=JSON|data=extract=extract}}
How I displayed the extract on pages
{{#external_value:extract}}
I however need to figure out how to get only a paragraph from the return text. Will probably use a parser function.

wikipedia template data api

I want to download the template source used in a wikipedia page (basically for generating the display text of a key). SO i am basically want this info
http://en.wikipedia.org/w/index.php?title=Template:Infobox%20cricketer&action=edit
for Template:Infobox cricketer
I have found an api for wikipedia called Template data
http://www.mediawiki.org/wiki/Extension:TemplateData
But the examples given:
http://en.wikipedia.org/w/api.php?action=templatedata&titles=Template:Stub
does not seem to work.
I think you misunderstood what Extension:TemplateData is for. It's for getting metadata about a template, which only works if that template provides those metadata.
If what you want the text of the template, you should use prop=revisions&rvprop=content, for example:
http://en.wikipedia.org/w/api.php?action=query&titles=Template:Infobox%20cricketer&prop=revisions&rvprop=content

How do I grab figures from a url and place them in my page?

http://www.bloomberg.com/markets/ has several figures that I would like to display on my html page.
If I just have a div and say I want it to display how much percent some financial market has changed, how to I get the div to display whatever figure is published to Bloomberg? So that whenever I reload my website the most up to date figure from Bloomberg is displayed in plain text in my div?
So instead of
<div>0.05%</div>
I have
<div>(some code here to pull the correct figure from bloomberg)</div>
Bloomberg has an API that you can use to get their market data for free:
http://www.openbloomberg.com/open-api/
Now, you can adopt Bloomberg’s market data interfaces without cost or restriction.
What you are asking is called data parsing and it is pretty common request. If you want to do it using PHP, PHP Simple HTML DOM parser or phpQuery provide plenty of examples.

dynamic HTML page to pdf

I know there is a list of similar questions but all handle pages without user interaction (static even though some js may be there).
Let's say we've a page the user can interact (e.g. svg than changes, or html tables with drilldown - content changes). Those interactions will change the page. Same happens in stackoverflow when entering the question...
The idea is adding a button, "convert to pdf" taking the state of the html and sending to the user back a pdf version (we've a Java server).
Using the print of the browser is not the answer I'm looking for :-).
Is this a stick in the moon ?
You would have to store the parameters that generate the HTML view (i.e. what the user clicks on, what selections they make, etc). If you can have a list of parameters that generate the HTML view, you can have a method which accepts the list of parameters (JSON post?), generates the HTML view and passes it to your PDF generating routine. I'm not too familiar with Java libraries for this purpose, but PHP has TCPDF can take html output to basically generate a PDF for you. Certainly, there are Java libraries which will allow you to do the same thing, or you can use the parameters to get a list of rows/arrays which can be iterated over and output using the PDF library of your choice.
Both iTextPDF and Aspose.PDF would allow you to do that (I've seen them used in two different projects), but there is no magic and you will have to do some work.
The steps are roughly:
Get (as a string) the part of the document which you want to print with jQuery or innerHTML
Call a service on the server side to convert this to PDF
[Serverside] Use a whitlist - based tool to clean up the hmtl (unless you want to be hacked). JSoup is great for that.
[Serverside] Use IText or Aspose API to create the PDF from the HTML (this is not trivial, you will have to read the doc)
Download the document
I'd also recommend DocRaptor, an HTML to PDF API built by my company, Expected Behavior.
DocRaptor uses Prince XML to generate PDFs, and thus produces higher quality results than similar products.
Adding PDF generation to your own web application using our service is as simple as making an HTTP POST request to our server.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our API documentation:
DocRaptor API documentation