Failing to query OpenLibrary - json

I want to compose a query to OpenLibrary's RESTful API that does the following:
filters the book list by the first five letters of the title
returns the book's title, author, publication date, description, and a link to the large thumbnail of the cover
So far this is all I've been able to compose with any success:
http://openlibrary.org/query.json?type=/type/edition&authors=/authors/OL1A&covers=&title=&publish_date=&description=
You can cut and paste into your browser to see the result, OpenLibrary doesn't require an API key.
My main obstacles seem to be:
I can't figure out how to filter the books by the first five letters of the title
I can't figure out how to turn the cover information into a link to the actual thumbnail
Any help?

The API does not allow you to a search the first 5 natives, but you it possible create a code to consume API and to apply the regex.
example:
first five letters searched Bhānu
the regex would look like this: [Bhānu] or Bhānu
the regex would look like this: "bhānumatīra deśa."
link to regex example: https://regex101.com/r/cTVX1Z/4

Related

Create a web service to crawl public URLs and get reviews

I would like to find the reviews on a specific website, (say, https://www.rogerebert.com/), and find the section of movie reviews and pull some of the basic info (like the review text body,the title, image, etc).
Calling any regular site will return the full HTML, so I am wondering then how to format my request to say "hey, look for this specific section that has the word 'review' in it".
Any idea how to do this?
I can get the full HTML from a site (e.g, I used Postman to make a GET request to https://www.rogerebert.com/), but I don't know how to format that GEt request to say "also look for specific sections".

How to retrieve an url to a book cover only knowing its isbn

is there a way to obtain an URL to a cover of a book when only knowing the book's API?
I have tried two approaches yet.
First, https://openlibrary.org/dev/docs/api/covers which does not work for me since they did not find any covers for the relatively new german books that I need the cover links for
Then I tried the google books API https://developers.google.com/books/docs/v1/using which was more promising. It is possible to search via ISBN there and the Google API found data for the books I'm interested in.
Unfortunately, the google API returns a JSON object. This JSON object contains a link to the book's cover. However, I cannot use Javascript/php or something in my application to retrieve the cover URL from the JSON object.
I can only use HTML and therefore need a direct link to a cover when providing a book ISBN instead of a JSON object containing the URL.
Ideally, a cover URL for a book with ISBN XXX would look like this
https://some_text/XXX/some_text such that I can directly use it as src in an HTML image container.
Does anyone have an idea on how to approach this problem?
Thanks in advance!

Wikipedia API - get random page(s)

I'm trying to get a JSON result with a set of random pages from Wikipedia, including their titles, content and images.
I've played around with their API sandbox, and so far the best I've got is this:
https://en.wikipedia.org/w/api.php?action=query&list=random&format=json&rnnamespace=0&rnlimit=10
But this only includes the namespace, id, and title of ten random pages. I would like to get the content as well as images as well.
Do anyone know how?
Alternatively I could do with the title, content and image url's of a single random page.
Best I've got here is:
https://en.wikipedia.org/w/api.php?action=query&generator=random&format=json
You're close. generator=random is the right way to go. You can then use various prop values to get the info you want:
Page title is always included.
To get the text, use prop=revisons along with rvprop=content.
To get all images used on the page, use prop=images.
Note that this will often include images you're probably not interested in, like icons and flags. To fix that, you might try instead prop=pageimages, though it doesn't seem to work always. Or you could try using both.
So, the final query could look like this:
https://en.wikipedia.org/w/api.php?format=json&action=query&generator=random&grnnamespace=0&prop=revisions|images&rvprop=content&grnlimit=10
If you'd rather use their REST api,
curl -X GET "https://en.wikipedia.org/api/rest_v1/page/random/summary"
Documentation

Get an article summary from the MediaWiki API

I am looking for a mediawiki api using which I can get short description about any query string. For example , if I search for Nicolas Cage then it should return the short description for him.
I tried http://en.wikipedia.org/w/api.php?%20format=json&action=query&titles=Nicolas%20Cage&prop=revisions&rvprop=content
I am not sure if prop=revisions is right. My intention is to get a short description on the final version of the page.
Also I need another api which can give the link of the wikipedia page (web / mobile) from the query string. i.e. For Nicolas Cage, http://en.wikipedia.org/wiki/Nicolas_cage should be returned.
There is no such thing as a page summary in MediaWiki by default,but you can get the first paragraph of a page like this: http://en.wikipedia.org/w/api.php?action=parse&page=Nicolas_Cage&prop=text&section=0
If the wiki has the extension PageSummaries installed, you can use that to get exactly what you are asking for (like in this example from the extension description page).
To find pages matching a string, you use the open search function, like this: http://en.wikipedia.org/w/api.php?action=opensearch&search=Nicolas%20cage&namespace=0
edit: #Bergi point out in the comments that open search also gives a summary of the page. I had somehow missed that.
Say, you want to get the summary of a search string Nicolas Cage.
Step 1. Get the page id: "https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Nicolas%20Cage&format=json&srlimit=1"
Step 2. Use this page id to get section 0 of the page:
"https://en.wikipedia.org/w/api.php?action=parse&section=0&pageid=21111&prop=text&format=json"
Step 3. Parse as per requirements.
Step 3 extended for Python: Use BeautifulSoup for target tags and get_text() gives plaintext.
use rvprop to get latest revision, further go through mediaWIKI documentation.
Alternate Solution:
Step 1. Get page title using step 1 above.
Step 2. Use the title as follows: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Nicolas%20Cage

mediawiki api. how to chose page from response

When I make api query sometimes I have list with few pages. For example
http://en.wikipedia.org/wiki/Ask gives a lot of pages, I need website "Ask.com, a web search engine, formerly Ask Jeeves"
can I make query only for some category ("websites")?
How I can check category for each page in response?
Thanks
There is no trivial way to do what you're asking. You could do something like this:
Get the list of pages the disambiguation page list. You could do this by listing the links on that page (action=query&prop=links).
Get the categories of all the pages from the previous step and use that to decide which one is the one you're looking for. This is not that simple, because Ask.com is not directly in Category:Websites, it's in one of its subcategories.
I have list with few pages, for example http://en.wikipedia.org/wiki/Ask
The problem is that you're not getting a list of pages, you just are getting an ordinary page which is in the disambiguation pages category. To get the list, you need to get the links in that page.
can I make query only for some category ("websites")?
No, mediawiki does not support that.
How I can check category for each page in response?
Use the links property as a title list generator and get the categories of each page in the response. In your case, that would be http://en.wikipedia.org/w/api.php?action=query&titles=Ask&generator=links&prop=categories (don't forget to continue the query).
If you are OK with "full-text search" for "ask",
you can do that like this:
http://en.wikipedia.org/w/api.php?format=json&action=query&generator=search&gsrsearch=ask%20incategory:%22Online%20companies%22&prop=info
As you can see, "search" text is [ask incategory:"Online companies"]
The same solution also can be seen at:
Wikipedia API: how to search for a term in a specific category