How do I grab figures from a url and place them in my page? - html

http://www.bloomberg.com/markets/ has several figures that I would like to display on my html page.
If I just have a div and say I want it to display how much percent some financial market has changed, how to I get the div to display whatever figure is published to Bloomberg? So that whenever I reload my website the most up to date figure from Bloomberg is displayed in plain text in my div?
So instead of
<div>0.05%</div>
I have
<div>(some code here to pull the correct figure from bloomberg)</div>

Bloomberg has an API that you can use to get their market data for free:
http://www.openbloomberg.com/open-api/
Now, you can adopt Bloomberg’s market data interfaces without cost or restriction.

What you are asking is called data parsing and it is pretty common request. If you want to do it using PHP, PHP Simple HTML DOM parser or phpQuery provide plenty of examples.

Related

Send and receive data to and from a website using the TWebbrowser component in Delphi

I'm creating a VCL Application with Delpi 10.3 and want to support some web functionality by having the user enter the ISBN of a book into a TEdit component and from there passing/sending this value to a search field on this website: https://isbnsearch.org after which the website looks up the ISBN and displays the Author of the book. I want to somehow access the information (i.e Author) presented by the search result and again use it in my application.
This is my GUI, for a better idea of what I want to accomplish:
What code can I use for this? Any other feasible suggestions or approaches are acceptable.
When performing a search on that website, it simply loads a page with a specific URL query string...
https://isbnsearch.org/search?s=suess
The above example is when I search for "suess", so you can easily concatenate a search URL.
You can use any HTTP component, such as TIdHTTP, to load this search page, then use an HTML parser to scrape the page and read what you need. Much, much easier than trying to read through the TWebBrowser.
In the end, you won't actually display the HTML (I mean you can if you want to), but the idea is to read the data and display it in your own format.
On that specific page, start by locating the ul element with id searchresults. Then, each li element contains individual results. Unfortunately, this website uses pagination, and only shows 10 results per page. To do this, call this page again with another parameter &p=2 for the 2nd page, &p=3 for the 3rd page, and so on.
On the other hand, that is the worst way to acquire such information. What you should be doing is using a proper API which gives you machine-friendly data. The service you are referencing doesn't appear to have an option, but here's an example of one which does:
https://openlibrary.org/dev/docs/api/books - this also appears to provide you MUCH more information than the one you're using.

Wikipedia MediaWiki API Get Template Content via URL Request

I've been going through the doc's for past few hours and simply can't seem to figure this out though probably simple.
I have this link:
http://en.wikipedia.org/w/api.php?format=xml&action=expandtemplates&titles=Arabinose&text={{Chembox%20Elements}}&prop=wikitext
Which obviously will give me the schema of Template Chembox | Chembox Elements in this case.
All I simply want is to retrieve the Molecular forumla content/data/value for the given page/title without having to parse the entire wiki content at my end.
Understand I have prop=wikitext which will be returning wikitext in the above example, there's no option in expandtemplates for prop=text. I've been back and forth with action=query, expandedtemplates etc and no joy.
MediaWiki's API won't do the work for you. You'll have to parse your own results.

Automate Web Applications -parsing HTML Data

I just want to automate a web application, where that application parses the HTML page and pulls all the HTML Tags inner text based on some condition like if we have a tag called Span Example has given whose class="spanclass_1"
This is span tag...
which has particular class id. so that app parses and pulls that span into it.
And here the main pain area is, I should not use the developer code to automate that same parsing the HTML.
I want to automate that parsing done correctly, simply by using the parsed data which is shown in UI.
Any help, would be great.
Appreciating your time reading this.
(Note span tag is not shown)
Thanks buddies.
not enough details.
is this html page just a file in local filesystem on it is internet webpage?
do u have access to pages? can u modify it ? if answer yes, that just add javascript to page which will extract data and post to server.
if answer not, than it depends on language u use to programm.
Find good framework to parse html. load page parse it and extract data. Several situation can be there.
Worse scenario - page generated on client side using js.
Best scenario - page is in xhtml mode( u are lucky. any xml parser will help to build dom and extract data)
So so - page is simple html format (try several html parser to find most suitable for u)

Display html articles in a easy to read format

I have looked at the readability api which is useful to display data in a clean format on a html webpage. I am passing a Url to http://www.readability.com/read?url= to display the data. I am initially directed to a page where I can choose to view the info using readability is there any way I can directly view the content in a neat fashion without going through the actual re-direct?
take a look at Readability's API: http://www.readability.com/developers/api
Before you implement your code, you have to create an API Key on their website.

Parsing a website and getting the info I need

hi so I need to retrieve the url for the first article on a term I search up on nytimes.com
So if I search for Apple. This link would return the result
http://query.nytimes.com/search/sitesearch?query=Apple&srchst=cse
And you just replace Apple with the term you are searching for.
If you click on that link you would see that NYtimes ask you if you mean Apple Inc.
I want to get the url for this link, and go to it.
Then you will just get a lot of information on Apple Inc.
If you scroll down you will see the articles related to Apple.
So what I ultimately want is the URL of the first article on this page.
So I really do not know how to go about this. Do I use Java, or what do I use? Any help would be greatly appreciated and I would put a bounty on this later, but I need the answer ASAP.
Thanks
EDIT: Can we do this in Java?
You can use Python with the standard urllib module to fetch the pages and the great HTML parser BeautifulSoup to obtain the information you need from the pages.
From the documentation of BeautifulSoup, here's sample code that fetches a web page and extracts some info from it:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
where, linebreak, what = incident.contents[:3]
print where.strip()
print what.strip()
print
This this is a nice and detailed article on the topic.
You certainly can do it in Java. Look at the HttpURLConnection class. Basically, you give it a URL, call the connect function, and you get back an input stream with the contents of the page, i.e. HTML text. You can then process that and parse out whatever information you want.
You're facing two challenges in the project you are describing. The first, and probably really the lesser challenge, is figuring out the mechanics of how to connect to a web page and get hold of the text within your program. The second and probably bigger challenge will be to figure out exactly how to extract the information you want from that text. I'm not clear on the details of your requirements, but you're going to have to sort through a ton of text to find what you're looking for. Without actually looking at the NY Times site at the momemnt, I'm sure it has all sorts of decorations like pretty pictures and the company logo and headlines and so on, and then there are going to be menus and advertisements and all sorts of stuff. I sincerely doubt that the NY Times or almost any other commercial web site is going to return a search page that includes nothing but a link to the article you are interested in. Somehow your program will have to figure out that the first link is to the "subscribe on line" page, the second is to an advertisement, the third is to customer service, the fourth and fifth are additional advertisements, the sixth is to the home page, etc etc until you finally get to the one you're actually interested in. How will you identify the interesting link? There are probably headings or formatting that make it recognizable to a human being, but you use a lot of intuition to screen out the clutter that can be difficult to reproduce in a program.
Good luck!
You can do this in C# using the HTML Agility Pack, or using LINQ to XML if the site is valid XHTML. EDIT: It isn't valid XHTML; I checked.
The following (tested) code will get the URL of the first search result:
var doc = new HtmlWeb().Load(#"http://query.nytimes.com/search/sitesearch?query=Apple&srchst=cse");
var url = HtmlEntity.DeEntitize(doc.DocumentNode.Descendants("ul")
.First(ul => ul.Attributes["class"] != null
&& ul.Attributes["class"].Value == "results")
.Descendants("a")
.First()
.Attributes["href"].Value);
Note that if their website changes, this code might stop working.