Not sure if this is possible, god save me from doing html regex parsing.
So I have a friend that wants a premier league (english football leauge) standings feeds on his website. I've been trying to read the wikipedia api and can't seem to find a way to convert this html element into json map. My query looks like this, very dirty, any help well appreciated.
https://en.wikipedia.org/w/api.php?format=json&action=parse&page=2015%E2%80%9316_Premier_League&format=json&prop=text§ion=6&callback=?
Here is the element Im trying to target:
https://en.wikipedia.org/wiki/2015%E2%80%9316_Premier_League#League_table
Related
is there a way to obtain an URL to a cover of a book when only knowing the book's API?
I have tried two approaches yet.
First, https://openlibrary.org/dev/docs/api/covers which does not work for me since they did not find any covers for the relatively new german books that I need the cover links for
Then I tried the google books API https://developers.google.com/books/docs/v1/using which was more promising. It is possible to search via ISBN there and the Google API found data for the books I'm interested in.
Unfortunately, the google API returns a JSON object. This JSON object contains a link to the book's cover. However, I cannot use Javascript/php or something in my application to retrieve the cover URL from the JSON object.
I can only use HTML and therefore need a direct link to a cover when providing a book ISBN instead of a JSON object containing the URL.
Ideally, a cover URL for a book with ISBN XXX would look like this
https://some_text/XXX/some_text such that I can directly use it as src in an HTML image container.
Does anyone have an idea on how to approach this problem?
Thanks in advance!
I make use of an API that consists of a JSON with prefetched data from a CMS. The partial JSON looks like this:
"content": "<p><em>We're looking for a pro-active, analytical, commercially minded and ambitious deal-closer who loves to work - and play - hard to join our Company.
I then pass this data to a child component and then render it by using v-html. I expected this to output the HTML tags with styling and semantics. However, it renders the HTML tags as plain text:
<p><em/>We're looking for a pro-active, analytical, commercially minded and ambitious deal-closer who loves to work - and play - hard to join our Company.
Does anyone know what I am doing wrong? Should I have parsed the JSON? Should I have decoded the raw JSON to HTML tags first?
Nothing to do with JSON; everything to do with your web service unhelpfully giving you unparsed HTML.
You're going to have to decode these HTML entities yourself.
One common trick is to feed the unparsed HTML to an off-DOM element, then read it back via textContent, which will give you the parsed version.
let p = document.createElement('p');
p.innerHTML ='<p>'
console.log(p.textContent); //"<p>"
I would like to parse the content of a wikipedia page, but I do miss something which I do not understand. Can someone help me ?
Example:
I have a wikipedia page:
https://it.wikipedia.org/wiki/Anni_690_a.C.
In this page a chinese politican is mentoined: "Jin Wen Gong"
I try to use the following webservice to get the content, but in the json there is no data about "Jin Wen Gong".
https://it.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=Anni_690_a.C.&rvprop=content&format=json
How do I parse wikipedia correctly ?
The part you are looking for is not directly in the contents of that page, which you can see if you start editing the page: you will also not see any note of jin wen gong
The part where you see it is generated from this piece of wiki-code:
{{Bio decennio a.C.|Morti|69}}
This code is in the JSON.
On Wikipedia that translates to a list of people (probably people that have died in the mentioned year, if I guess the italian?).
http://www.bloomberg.com/markets/ has several figures that I would like to display on my html page.
If I just have a div and say I want it to display how much percent some financial market has changed, how to I get the div to display whatever figure is published to Bloomberg? So that whenever I reload my website the most up to date figure from Bloomberg is displayed in plain text in my div?
So instead of
<div>0.05%</div>
I have
<div>(some code here to pull the correct figure from bloomberg)</div>
Bloomberg has an API that you can use to get their market data for free:
http://www.openbloomberg.com/open-api/
Now, you can adopt Bloomberg’s market data interfaces without cost or restriction.
What you are asking is called data parsing and it is pretty common request. If you want to do it using PHP, PHP Simple HTML DOM parser or phpQuery provide plenty of examples.
hi so I need to retrieve the url for the first article on a term I search up on nytimes.com
So if I search for Apple. This link would return the result
http://query.nytimes.com/search/sitesearch?query=Apple&srchst=cse
And you just replace Apple with the term you are searching for.
If you click on that link you would see that NYtimes ask you if you mean Apple Inc.
I want to get the url for this link, and go to it.
Then you will just get a lot of information on Apple Inc.
If you scroll down you will see the articles related to Apple.
So what I ultimately want is the URL of the first article on this page.
So I really do not know how to go about this. Do I use Java, or what do I use? Any help would be greatly appreciated and I would put a bounty on this later, but I need the answer ASAP.
Thanks
EDIT: Can we do this in Java?
You can use Python with the standard urllib module to fetch the pages and the great HTML parser BeautifulSoup to obtain the information you need from the pages.
From the documentation of BeautifulSoup, here's sample code that fetches a web page and extracts some info from it:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
where, linebreak, what = incident.contents[:3]
print where.strip()
print what.strip()
print
This this is a nice and detailed article on the topic.
You certainly can do it in Java. Look at the HttpURLConnection class. Basically, you give it a URL, call the connect function, and you get back an input stream with the contents of the page, i.e. HTML text. You can then process that and parse out whatever information you want.
You're facing two challenges in the project you are describing. The first, and probably really the lesser challenge, is figuring out the mechanics of how to connect to a web page and get hold of the text within your program. The second and probably bigger challenge will be to figure out exactly how to extract the information you want from that text. I'm not clear on the details of your requirements, but you're going to have to sort through a ton of text to find what you're looking for. Without actually looking at the NY Times site at the momemnt, I'm sure it has all sorts of decorations like pretty pictures and the company logo and headlines and so on, and then there are going to be menus and advertisements and all sorts of stuff. I sincerely doubt that the NY Times or almost any other commercial web site is going to return a search page that includes nothing but a link to the article you are interested in. Somehow your program will have to figure out that the first link is to the "subscribe on line" page, the second is to an advertisement, the third is to customer service, the fourth and fifth are additional advertisements, the sixth is to the home page, etc etc until you finally get to the one you're actually interested in. How will you identify the interesting link? There are probably headings or formatting that make it recognizable to a human being, but you use a lot of intuition to screen out the clutter that can be difficult to reproduce in a program.
Good luck!
You can do this in C# using the HTML Agility Pack, or using LINQ to XML if the site is valid XHTML. EDIT: It isn't valid XHTML; I checked.
The following (tested) code will get the URL of the first search result:
var doc = new HtmlWeb().Load(#"http://query.nytimes.com/search/sitesearch?query=Apple&srchst=cse");
var url = HtmlEntity.DeEntitize(doc.DocumentNode.Descendants("ul")
.First(ul => ul.Attributes["class"] != null
&& ul.Attributes["class"].Value == "results")
.Descendants("a")
.First()
.Attributes["href"].Value);
Note that if their website changes, this code might stop working.