yql link list to html - html

I'm using yql to return a list of links from specific webpages. So far so good, it returns all the links that I pretend, however I don't know how to translate that info into my webpage.
Here's what I'm trying to do:
YQL returns a list of links in the
results
I want those links to appear in my
webpage, inside a table, inside divs,
etc... like if i wrote them there.
I have been trying to find a way to do this but I don't know much of js and json so I'm here trying to achieve some answer from those of you that might know a way.

There are a couple of ways to do this, depending on which approach you want to take.
First, and simplest, is server-side generation. This is what would result, for example, if a user hits a Submit button on a search form to send you his query, your script receives the query and generates the page, then sends that page to the user. In this case, your question is largely trivial. In pseudocode:
ASSIGN the list of results to list L
FOR EACH ITEM r IN L
PRINT a string containing an HTML template, substituting the value
r where appropriate
It's trivial enough that I suspect you want to do this via DOM manipulation. This is what requires JavaScript: you get the query, send the request without refreshing the page, and want to add the results to the DOM. If you're receiving the list of results, you're already most of the way there. Using jQuery, you would do the same thing as in the pseudocode above, except that where it has the PRINT statement you would have:
$(".SearchResults").append("<li>" + r + "</li>");
I highly recommend reading through the jQuery tutorial. It's not as hard as you think.

Related

Cannot collect all nodes of Google search result with goquery: some nodes are missing

I am trying to collect results of a google search page in GoLang using the goquery library. In order to achieve this, I am collecting all nodes of a goquery selection with goquery. The problem is that the selection returned by Find("*") does not seem to contain all the nodes of the HTML document. Question: does the method collect ALL nodes with the whole tree structure or not ? If not, is there a method to collect them all ?
I tried using the goquery Find("*") method applied to the whole document selection. So nodes with certain attributes are not returned, although they are in the HTML document. For instance, nodes with are not recognized
alltags := doc.Find("*") //doc is the HTML doc with the Google search
The selection does not contain the div tags with class="srg". The same applies to other class values such as "bkWMgd", "rc" for example.
This has happened to me before. I was trying to web scrape with python beautiful soup package and the same thing was happening.
Later it turned out that the html markup returned when trying to fetch it was actually the markup the server returned after finding a bot. I solved this by setting the User-Agent to Mozilla/5.0.
Hope this helps in your quest to solve this.
You can start by updating the code for the fetch request you have performed.

Send and receive data to and from a website using the TWebbrowser component in Delphi

I'm creating a VCL Application with Delpi 10.3 and want to support some web functionality by having the user enter the ISBN of a book into a TEdit component and from there passing/sending this value to a search field on this website: https://isbnsearch.org after which the website looks up the ISBN and displays the Author of the book. I want to somehow access the information (i.e Author) presented by the search result and again use it in my application.
This is my GUI, for a better idea of what I want to accomplish:
What code can I use for this? Any other feasible suggestions or approaches are acceptable.
When performing a search on that website, it simply loads a page with a specific URL query string...
https://isbnsearch.org/search?s=suess
The above example is when I search for "suess", so you can easily concatenate a search URL.
You can use any HTTP component, such as TIdHTTP, to load this search page, then use an HTML parser to scrape the page and read what you need. Much, much easier than trying to read through the TWebBrowser.
In the end, you won't actually display the HTML (I mean you can if you want to), but the idea is to read the data and display it in your own format.
On that specific page, start by locating the ul element with id searchresults. Then, each li element contains individual results. Unfortunately, this website uses pagination, and only shows 10 results per page. To do this, call this page again with another parameter &p=2 for the 2nd page, &p=3 for the 3rd page, and so on.
On the other hand, that is the worst way to acquire such information. What you should be doing is using a proper API which gives you machine-friendly data. The service you are referencing doesn't appear to have an option, but here's an example of one which does:
https://openlibrary.org/dev/docs/api/books - this also appears to provide you MUCH more information than the one you're using.

Openrefine cannot fetch html code inside accordion

I know that openrefine is not a perfect tool for web scraping but looking for some helps from the first step.
I cannot collect the full html codes from openrefine when I add column by fetching url (https://profiles.health.ny.gov/hospital/view/103094). They do not incorporate any codes under accordion such as services, bed types, and etc.
Any idea to get the full codes by fetching in openrefine?
I am trying to collect information under administrative, whose Xpath is "//div[4]/div/ul/li" ("div#AdministrativeBox.in.collapse")
This website loads its content dynamically using Javascript. The information that interests you is not stored in the source code of the page, so Open Refine cannot extract it.
However, there is a workaround. If you transform your URLs with the GREL formula value.replace('view', 'tab_overview'), you will get scrapable pages like this one.
Note that OpenRefine does not use Xpath, but JSOUP selectors. To get the elements of the "Administrative" block, you can use this GREL formula.
forEach(value.parseHtml().select('#AdministrativeBox li'), e, e.htmlText()).join(',')
Result:

Can Go capture a click event in an HTML document it is serving?

I am writing a program for managing an inventory. It serves up html based on records from a postresql database, or writes to the database using html forms.
Different functions (adding records, searching, etc.) are accessible using <a></a> tags or form submits, which in turn call functions using http.HandleFunc(), functions then generate queries, parse results and render these to html templates.
The search function renders query results to an html table. To keep the search results page ideally usable and uncluttered I intent to provide only the most relevant information there. However, since there are many more details stored in the database, I need a way to access that information too. In order to do that I wanted to have each table row clickable, displaying the details of the selected record in a status area at the bottom or side of the page for instance.
I could try to follow the pattern that works for running the other functions, that is use <a></a> tags and http.HandleFunc() to render new content but this isn't exactly what I want for a couple of reasons.
First: There should be no need to navigate away from the search result page to view the additional details; there are not so many details that a single record's full data should not be able to be rendered on the same page as the search results.
Second: I want the whole row clickable, not merely the text within a table cell, which is what the <a></a> tags get me.
Using the id returned from the database in an attribute, as in <div id="search-result-row-id-{{.ID}}"></div> I am able to work with individual records but I have yet to find a way to then capture a click in Go.
Before I run off and write this in javascript, does anyone know of a way to do this strictly in Go? I am not particularly adverse to using the tried-and-true js methods but I am curious to see if it could be done without it.
does anyone know of a way to do this strictly in Go?
As others have indicated in the comments, no, Go cannot capture the event in the browser.
For that you will need to use some JavaScript to send to the server (where Go runs) the web request for more information.
You could also push all the required information to the browser when you first serve the page and hide/show it based on CSS/JavaScript event but again, that's just regular web development and nothing to do with Go.

Pulling out some text from a giant HTML file using Nokogiri/xpath

I am scraping a website and am trying to pull out certain elements from the HTML. In the sites I am scraping, there are script tags with a bunch of info in them however, there is one part inside these tags that I am interested in. The line basically looks like:
'image':'http://ut5.example.com/t/231/3_b_643435.jpg',
With some stuff above and below it. Now, this is different for each page source except for obviously the domain and some of the subfolders that store the images.
How would I go about looking through the source for this specific line, and cutting out just the URL? I would need to use regular expressions I feel as the URLs are dynamic.
The "gsub" method does something similar to what I want to search for, with its ability to use /regex/. But, I am not wanting to replace anything, I just want to find that URL in the source code using a /regex/ and copy it.
According to you comments, this is what you're looking for I guess
var regex = /http.+/;
Example http://jsfiddle.net/Km9ZB/