Send and receive data to and from a website using the TWebbrowser component in Delphi - html

I'm creating a VCL Application with Delpi 10.3 and want to support some web functionality by having the user enter the ISBN of a book into a TEdit component and from there passing/sending this value to a search field on this website: https://isbnsearch.org after which the website looks up the ISBN and displays the Author of the book. I want to somehow access the information (i.e Author) presented by the search result and again use it in my application.
This is my GUI, for a better idea of what I want to accomplish:
What code can I use for this? Any other feasible suggestions or approaches are acceptable.

When performing a search on that website, it simply loads a page with a specific URL query string...
https://isbnsearch.org/search?s=suess
The above example is when I search for "suess", so you can easily concatenate a search URL.
You can use any HTTP component, such as TIdHTTP, to load this search page, then use an HTML parser to scrape the page and read what you need. Much, much easier than trying to read through the TWebBrowser.
In the end, you won't actually display the HTML (I mean you can if you want to), but the idea is to read the data and display it in your own format.
On that specific page, start by locating the ul element with id searchresults. Then, each li element contains individual results. Unfortunately, this website uses pagination, and only shows 10 results per page. To do this, call this page again with another parameter &p=2 for the 2nd page, &p=3 for the 3rd page, and so on.
On the other hand, that is the worst way to acquire such information. What you should be doing is using a proper API which gives you machine-friendly data. The service you are referencing doesn't appear to have an option, but here's an example of one which does:
https://openlibrary.org/dev/docs/api/books - this also appears to provide you MUCH more information than the one you're using.

Related

How to extract data and generate URL from it?

I'm new to stackoverflow (Hello World!). I have some basic understanding of JS, C++, HTML, and CSS and I have been looking in this and other forums but I am having problems figuring out this one, mostly because I don't know what this would be called (TLDR at the bottom):
Essentially, I would like build a chrome extension that extracts data from a website (in this case, copart - a website where people sell cars) and create a link from it that opens another window to one of three car evaluators (edmunds, kbb, nada). I fix cars as a hobby but it's a pain to have to input vehicle info over and over so I wanted to automatize the process as much as possible. Hopefully this will help others as well.
E.g. a generic link to edmunds is: https://www.edmunds.com/ford/escape/2018/appraisal-value/?vin=XXXXXXXXXXXXXX. I would like to know how to extract the make, model, year and VIN, in this case, from copart (Example copart page). On Kbb, e.g., all I see that can automatized is inputting the vin into the window and clicking "Go". Is there a way to have the plugin automatically select "VIN" and copy the VIN into the field while clicking the "Go" button?
Kbb
I know, a lot of questions. I'm also not quite sure what this would be called? A crawler? A scraper? A craper? :)
Either way, here the basic (TLDR) question:
How to create a chrome plugin that extracts data from one website, opens a URL using that data, and which then performs an action like switching a label, populating a textbox, and clicking a button on that URL?
I have only posed this question here so if there's a better place to put it, please let me know.
Mark
Extracting data from one website and searching more for scraped data in other website
1. For this project you can use combination of selenium and scrapy
2. Since both are dynamic page powered by javascript do need to check on security constraints
3. Can make use of spider under scrapy each spider with support of selenium
4. there is need of pressing Go button that can be achieved using selenium

Can Go capture a click event in an HTML document it is serving?

I am writing a program for managing an inventory. It serves up html based on records from a postresql database, or writes to the database using html forms.
Different functions (adding records, searching, etc.) are accessible using <a></a> tags or form submits, which in turn call functions using http.HandleFunc(), functions then generate queries, parse results and render these to html templates.
The search function renders query results to an html table. To keep the search results page ideally usable and uncluttered I intent to provide only the most relevant information there. However, since there are many more details stored in the database, I need a way to access that information too. In order to do that I wanted to have each table row clickable, displaying the details of the selected record in a status area at the bottom or side of the page for instance.
I could try to follow the pattern that works for running the other functions, that is use <a></a> tags and http.HandleFunc() to render new content but this isn't exactly what I want for a couple of reasons.
First: There should be no need to navigate away from the search result page to view the additional details; there are not so many details that a single record's full data should not be able to be rendered on the same page as the search results.
Second: I want the whole row clickable, not merely the text within a table cell, which is what the <a></a> tags get me.
Using the id returned from the database in an attribute, as in <div id="search-result-row-id-{{.ID}}"></div> I am able to work with individual records but I have yet to find a way to then capture a click in Go.
Before I run off and write this in javascript, does anyone know of a way to do this strictly in Go? I am not particularly adverse to using the tried-and-true js methods but I am curious to see if it could be done without it.
does anyone know of a way to do this strictly in Go?
As others have indicated in the comments, no, Go cannot capture the event in the browser.
For that you will need to use some JavaScript to send to the server (where Go runs) the web request for more information.
You could also push all the required information to the browser when you first serve the page and hide/show it based on CSS/JavaScript event but again, that's just regular web development and nothing to do with Go.

Display html articles in a easy to read format

I have looked at the readability api which is useful to display data in a clean format on a html webpage. I am passing a Url to http://www.readability.com/read?url= to display the data. I am initially directed to a page where I can choose to view the info using readability is there any way I can directly view the content in a neat fashion without going through the actual re-direct?
take a look at Readability's API: http://www.readability.com/developers/api
Before you implement your code, you have to create an API Key on their website.

dynamic HTML page to pdf

I know there is a list of similar questions but all handle pages without user interaction (static even though some js may be there).
Let's say we've a page the user can interact (e.g. svg than changes, or html tables with drilldown - content changes). Those interactions will change the page. Same happens in stackoverflow when entering the question...
The idea is adding a button, "convert to pdf" taking the state of the html and sending to the user back a pdf version (we've a Java server).
Using the print of the browser is not the answer I'm looking for :-).
Is this a stick in the moon ?
You would have to store the parameters that generate the HTML view (i.e. what the user clicks on, what selections they make, etc). If you can have a list of parameters that generate the HTML view, you can have a method which accepts the list of parameters (JSON post?), generates the HTML view and passes it to your PDF generating routine. I'm not too familiar with Java libraries for this purpose, but PHP has TCPDF can take html output to basically generate a PDF for you. Certainly, there are Java libraries which will allow you to do the same thing, or you can use the parameters to get a list of rows/arrays which can be iterated over and output using the PDF library of your choice.
Both iTextPDF and Aspose.PDF would allow you to do that (I've seen them used in two different projects), but there is no magic and you will have to do some work.
The steps are roughly:
Get (as a string) the part of the document which you want to print with jQuery or innerHTML
Call a service on the server side to convert this to PDF
[Serverside] Use a whitlist - based tool to clean up the hmtl (unless you want to be hacked). JSoup is great for that.
[Serverside] Use IText or Aspose API to create the PDF from the HTML (this is not trivial, you will have to read the doc)
Download the document
I'd also recommend DocRaptor, an HTML to PDF API built by my company, Expected Behavior.
DocRaptor uses Prince XML to generate PDFs, and thus produces higher quality results than similar products.
Adding PDF generation to your own web application using our service is as simple as making an HTTP POST request to our server.
Here's a link to DocRaptor's home page:
DocRaptor
And a link to our API documentation:
DocRaptor API documentation

REST/Ajax deep linking compatibility - Anchor tags vs query string

So I'm working on a web app, and I want to filter search results.
A nice restful implementation might look like this:
1. mysite.com/clothes/men/hats+scarfs
But lets say we want to ajax up the filtering, like the cool kids, and we want to retain deep linking, we might use the anchor tag and parse that with Javascript to show the correct listings:
2. mysite.com/clothes#/men/hats+scarfs
However, if someone clicks the first link with JS enabled, and then changes filters, we might get:
3. mysite.com/clothes/men/hats+scarfs#/women/shoes
Urk.
Similarly, if someone does not have JS enabled, and clicks link 2 - JS will not parse the options and the correct listings will not be shown.
Are Ajax deep links and non-Ajax links incompatible? It would seem so, as servers cannot parse the # part of a url, since it is not sent to the server.
There's a monkeywrench being thrown into this issue by Google: A proposal for making Ajax crawlable. Google is including recommendations for url structure there that may give you ideas for your own application.
Here's the wrapup:
In summary, starting with a stateful
URL such as
http://example.com/dictionary.html#AJAX
, it could be available to both
crawlers and users as
http://example.com/dictionary.html#!AJAX
which could be crawled as
http://example.com/dictionary.html?_escaped_fragment_=AJAX
which in turn would be shown to users
and accessed as
http://example.com/dictionary.html#!AJAX
View Google's Presentation here (note: google docs presentation)
In general I think it's useful to simply turn off JavaScript and CSS entirely and browse your website and web application and see what ends up getting exposed. Once you get a sense of what's visible, you will understand what most search engines see and that in turn will show you what is and is not getting spidered.
If you go to mysite.com/clothes/men/hats+scarfs with JavaScript enabled then your JavaScript should automatically rewrite that to mysite.com/clothes#men/hats+scarfs - when you click on a filter, they should be controlled by JavaScript meaning you'll only change the hashtag rather than the entire URL (as you're going to have return false anyway).
The problem you have is for non-JS users going to your JS enabled deeplinks as the server can't determine that stuff. Unfortunately, the only thing you can do is take them to mysite.com/clothes and make them start their journey again (as far as I'm aware). You'll need to try and ensure that when people link to the site, they use the hardcoded deeplink rather than the hashed deeplink
I don't recommend ever using the query string as you are sending data back to the server without direct relevance to the prior specified destination. That is a corruptible security hole as malicious code can be manually added to the query string to cause a XSS or buffer overflow attack at your webserver.
I believe REST was intended to work with absolute URIs without a query string, because then your specifying only a location of a resource and it is that location that is descriptive and semantically relevant in addition to the possibility of the resource being so equally relevant. Even if there is no resource at the specified path you have still instantiated a potentially unique and descriptive location that can be processed accordingly.
Users entering the site via deep links
Nonsensical links (like /clothes/men/hats#women/shoes) can be avoided if you construct your Ajax initialisation code in such a way that users who enter the site on filtered pages (e.g. /clothes/women/shoes) are taken to the /clothes page before any Ajax filtering happens. For example, you might do something like this (using jQuery):
$("a.filter")
.each(function() {
var href = $(this).attr("href").replace("/clothes/", "/clothes#");
$(this).attr("href", href);
})
.click(function() {
update_filter($(this).attr("href").split("#")[1]);
});
Users without JavaScript
As you said in the question, there's no way for the server to know about the URL fragment so filtering would not be applied for users without JavaScript enabled if they were given a link to /clothes#filter.
However, even without filtering, these links could be made more meaningful for non-JS users by using the filter strings as IDs in your /clothes page. To prevent this messing with the Ajax experience the IDs would need to be changed (or the elements removed) with JavaScript before the Ajax links were initialised.
How practical this is depends on how many categories you have and what your /clothes page contains.