Google Feed API - not returning feed URLs

Google Feed API - not returning feed URLs - json

When I use the Google Feed API's find search, a lot of the time it's not returning the URLs of the feed.
As an example, a find search for the query 'CNN', points to this URL:
https://ajax.googleapis.com/ajax/services/feed/find?v=1.0&q=cnn
And returns these results (trimmed):
{
"responseData": {
"query": "cnn",
"entries": [
{
"url": "",
"title": "<b>CNN</b>.com: Breaking News, US, World, Weather, Entertainment <b>...</b>",
"contentSnippet": "Find the latest breaking news and information on the top stories, weather, <br>\nbusiness, entertainment, politics, and more. For in-depth coverage, <b>CNN</b> <br>\nprovides ...",
"link": "http://www.cnn.com/"
},
{
"url": "",
"title": "World News - International Headlines, Stories and Video - <b>CNN</b>.com",
"contentSnippet": "<b>CNN</b> brings you International News stories, video and headlines from Europe, <br>\nAsia, Africa, the Middle East, and the Americas.",
"link": "http://www.cnn.com/world"
},
{
"url": "",
"title": "U.S. News - Headlines, Stories and Video - <b>CNN</b>.com",
"contentSnippet": "<b>CNN</b>.com brings you US News, Videos and Stories from around the country.",
"link": "http://www.cnn.com/us"
},
{
"url": "",
"title": "<b>CNN</b> (#<b>CNN</b>) | Twitter",
"contentSnippet": "The latest Tweets from <b>CNN</b> (#<b>CNN</b>). It's our job to #GoThere and tell the most <br>\ndifficult stories. Come with us!",
"link": "https://twitter.com/cnn"
},
{
"url": "http://gdata.youtube.com/feeds/base/users/CNN/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile",
"title": "<b>CNN</b> - YouTube",
"contentSnippet": "<b>CNN</b> operates as a division of Turner Broadcasting System, which is a subsidiary <br>\nof Time Warner. <b>CNN</b> identifies itself as -- and is widely known to be - the m...",
"link": "http://www.youtube.com/user/CNN"
}
]
},
"responseDetails": null,
"responseStatus": 200
}
The first 4 results don't have a URL attached. The 5th one does, but isn't relevant, so my initial idea to remove items with empty URLs wouldn't work.
Following the basic query example on their developer guide, their example shows the url field of the results filled in. Copy and pasting the URL they provided shows similar results, but without the url field filled out.
Unless I'm mistaken, this seems like it must be a problem with Google's results.

Related

Deserializing Nested JSON API response with Django

I'm pretty new to the DRF and serializing/deserializing. I'm slowly building a dashboard for my business during the corona virus and learning to code. I am in a little deep, but after spending more than $10k on developers on upwork and not really get much result, I figured, what do I have to lose?
Our software provider has a full API for our needs https://developer.myvr.com/api/, but absolutely no dashboard to report statistics about our clients reservation data.
The end result will be a synchronization of some of the data from their API to my database which will be hosted through AWS. I chose to do it this way due to having to do some post processing of data from the API. For example, we need to calculate occupancy rates(which is not an endpoint), expenses from our accounting connection and a few other small calculations in which the data is not already in the provided API. I originally wanted to use the data from the API solely, but I'm hesitant due to the reasons above.
That's the backstory, here are the questions:
The API response is extremely complex and nested multiple times, what is the best practise to extract a replication of the structure of the data to my own Database? Would I have to create models for each field manually?
Example response:
```{
"uri": "https://api.myvr.com/v1/properties/b6b0f2fe278f612b/",
"id": "b6b0f2fe278f612b",
"key": "b6b0f2fe278f612b",
"accessDescription": null,
"accommodates": 11,
"active": false,
"addressOne": "11496 Zermatt Dr",
"addressTwo": null,
"allowTurns": true,
"amenities": "https://api.myvr.com/v1/property-amenities/?propertyId=b6b0f2fe278f612b",
"automaticallyApprove": false,
"baseNightlyRate": "395.00",
"baseRate": {
"uri": "https://api.myvr.com/v1/rates/660c299d4785c32e/",
"id": "660c299d4785c32e",
"key": "660c299d4785c32e",
"externalId": null,
"baseRate": true,
"changeoverDay": null,
"created": "2019-01-19T08:02:36Z",
"currency": "USD",
"endDate": "2020-01-18",
"minStay": 3,
"modified": "2019-01-19T08:02:36Z",
"monthly": 0,
"name": "Base Rate",
"weekNight": 39500,
"nightly": 39500,
"position": 0,
"property": {
"name": "API Demo Property",
"uri": "https://api.myvr.com/v1/properties/b6b0f2fe278f612b/",
"id": "b6b0f2fe278f612b",
"externalId": null,
"key": "b6b0f2fe278f612b",
"slug": "api-demo-property"
},
"ratePlan": {
"uri": "https://api.myvr.com/v1/rate-plans/862caa3f5267602d/",
"key": "862caa3f5267602d",
"name": "Default Rates for Property"
},
"repeat": true,
"startDate": "2020-01-18",
"weekend": 0,
"weekendNight": 0,
"weekly": 250000
},
"bathrooms": "4.0",
"bedrooms": 4,
"bookingUrl": "https://myvr.com/reservation/redirect/booking/b6b0f2fe278f612b/",
"checkInTime": "16:00:00",
"checkOutTime": "10:00:00",
"city": "Truckee",
"commissionStructure": null,
"countryCode": "US",
"created": "2016-01-19T00:01:48Z",
"currency": "USD",
"customFields": {},
"description": "Luxurious living, scenic mountain setting, entertainment galore. Located on a quiet street in Tahoe Donner, our well equipped modern home is nestled into the wilderness. A babbling creek greets visitors approaching the front step as it collects into a small pond with a cascading waterfall. <br/>\n<br/>\nInside, over 3,000 sqft of luxurious living space divides itself between two floors. On the first floor, a beautiful kitchen with granite counters, gas stove and stainless steel appliances opens to a large great room centered around a wood burning fireplace and featuring 30' soaring ceilings. A spacious loft overlooks the great room, showcasing a large poker/card table. Upstairs features a large entertainment room, complete with wet bar, shuffleboard table, and state-of-the-art television setup with surround sound. The scenic backyard is accessible from a large deck featuring a new hot tub with seating for 7.",
"externalId": null,
"feePlan": {
"uri": "https://api.myvr.com/v1/fee-plans/4d1c44383755051b/",
"key": "4d1c44383755051b",
"name": "Default Fees for Listing"
},
"headline": "Beautiful Four Bedroom Lake Front Property",
"houseRules": null,
"instantBookingsEnabled": false,
"lat": "39.3422523000",
"level": "unit",
"localAreaDescription": "Tahoe Donner is a year round activity resort. The amenities include private beach/boat launching facilities, pools, recreation center, tennis, horseback riding, golf, downhill skiing as well as cross country skiing. Truckee is a historical mining town-having a western feel but also has museums, theaters, fine dining plus 2 large supermarkets-all less than 3 miles from the house. Our home is also located within a 15 minute drive to 4 major ski resorts. Downtown Reno is a short 40 minute drive away for those seeking a night on the town or the thrill of a Nevada casino.",
"lon": "-120.2271947000",
"lowestNightlyRate": "395.00",
"manual": "",
"modified": "2019-10-18T17:18:43Z",
"name": "API Demo Property",
"owner": null,
"postalCode": "96161",
"ratePlan": {
"uri": "https://api.myvr.com/v1/rate-plans/862caa3f5267602d/",
"key": "862caa3f5267602d",
"name": "Default Rates for Property"
},
"ratePlanLocked": false,
"region": "CA",
"shortCode": "API",
"size": 3000,
"slug": "api-demo-property",
"suitableElderly": "yes",
"suitableEvents": "unknown",
"suitableGroups": "yes",
"suitableHandicap": "no",
"suitableInfants": "unknown",
"suitableKids": "yes",
"suitablePets": "no",
"suitableSmoking": "no",
"transitDescription": null,
"type": "house",
"weekendNights": [
5,
6
]
}```
I think the best way to populate the database would be to run a custom management command to run a once off script, I've done this previously with another database, however I'm still stuck as I don't really want to write these models manually. Also a concern is if a field is missing or the structure changes.
This project is definitely above my skills and extremely ambitious, but I would appreciate any feedback or advice anyone might have.
Thanks,
Darren

So I didn't really get any interest in this question, but I ended up working it out myself.
I hope someone googles it and might find it helpful.
import requests
from rest_framework.response import Response
from django.core.management.base import BaseCommand, CommandError
from reservation.models import Reservation
import time
MYVR_URL = 'https://api.myvr.com/'
MYVR_RESERVATION = 'v1/reservations/?limit=100'
headers = {
'Authorization': 'Basic SOmeAPiCodeHeRe123=',
}
class Command(BaseCommand):
help = 'Imports new properties and saves the objects in the database'
def handle(self, *args, **options):
url = MYVR_URL + MYVR_RESERVATION
print("Populating Reservations")
def looping_api(url, headers):
while url:
r = requests.request("GET", url, headers=headers)
url = r.json().get('next')
props_data = r.json().get('results')
start_time = time.time()
for prop in props_data:
try:
created = Reservation.objects.update_or_create(
myvr_key=prop.get('key'),
adults=prop.get('adults'),
children=prop.get('children'),
checkin=prop.get('checkIn'),
checkout=prop.get('checkOut'),
checkinTime=prop.get('checkInTime'),
checkoutTime=prop.get('checkOutTime'),
guestFirstName=prop.get('firstName'),
dateCreated=prop.get('created'),
dateBooked=prop.get('dateBooked'),
dateCancelled=prop.get('dateCanceled'),
contact=prop.get('contact').get('name'),
contact_key=prop.get('contact').get('key'),
guest_type=prop.get('guestType'),
property_name=prop.get('property').get('name'),
property_key=prop.get('property').get('key'),
source_code=prop.get('source').get('code'),
source_name=prop.get('source').get('name'),
total_due=prop.get('quote').get('totalDue'),
total_refundables=prop.get(
'quote').get('totalRefundableFees'),
total_nonrefundables=prop.get(
'quote').get('totalNonrefundableFees'),
reference_id=prop.get('referenceId'),
)
print(
f"Added obj {prop.get('key')}")
except AttributeError as error:
print(f"{error} attribute is null or owner booking")
url = r.json().get('next')
print(r.json().get('next'))
print(len(props_data))
end_time = time.time()
duration = (end_time - start_time)
print(duration)
looping_api(url, headers)

Wikipedia API cloud of confusion: `list` vs `generator` vs `search`

My goal is to write an API search of Wikipedia that will:
Return pages only in the category "English language films"
Of those, return pages only beginning with the letters "Avatar" (or anything, really, just using those letters for an example)
Of those, give me the url, the title, the date, a summary, and the main page image.
So far I've tried three things, none of which seem to be able to do exactly what I want.
PROBLEM: list=allpages is just generally useless
Here's a search with list=allpages:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info%7Cpageimages%7Cextracts&list=allpages&inprop=url%7Cdisplaytitle&piprop=name%7Cthumbnail%7Coriginal&pithumbsize=100&exintro=1&explaintext=1&apprefix=Avatar&aplimit=3
Here's the result (just the first 3 pages):
{
"batchcomplete": "",
"continue": {
"apcontinue": "Avatar,_The_Last_Airbender",
"continue": "-||info|pageimages|extracts"
},
"query": {
"allpages": [
{
"pageid": 100368,
"ns": 0,
"title": "Avatar"
},
{
"pageid": 4846971,
"ns": 0,
"title": "Avatar's Abode"
},
{
"pageid": 35243953,
"ns": 0,
"title": "Avatar, Iran"
}
]
}
}
As you can see, it basically ignored all my prop requests.
PROBLEM: generator=allpages can't search by category
Here's a search using generator-allpages:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info%7Cpageimages%7Cextracts&generator=allpages&inprop=url%7Cdisplaytitle&piprop=name%7Cthumbnail%7Coriginal&pithumbsize=100&exintro=1&explaintext=1&gapprefix=Avatar
Here's the output from that (just the first result):
{
"batchcomplete": "",
"continue": {
"gapcontinue": "Avatar's_Abode",
"continue": "gapcontinue||"
},
"query": {
"pages": {
"100368": {
"pageid": 100368,
"ns": 0,
"title": "Avatar",
"contentmodel": "wikitext",
"pagelanguage": "en",
"pagelanguagehtmlcode": "en",
"pagelanguagedir": "ltr",
"touched": "2018-05-03T11:21:07Z",
"lastrevid": 838959509,
"length": 45784,
"fullurl": "https://en.wikipedia.org/wiki/Avatar",
"editurl": "https://en.wikipedia.org/w/index.php?title=Avatar&action=edit",
"canonicalurl": "https://en.wikipedia.org/wiki/Avatar",
"displaytitle": "Avatar",
"thumbnail": {
"source": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a0/Avatars.jpg/80px-Avatars.jpg",
"width": 80,
"height": 100
},
"original": {
"source": "https://upload.wikimedia.org/wikipedia/commons/a/a0/Avatars.jpg",
"width": 357,
"height": 448
},
"pageimage": "Avatars.jpg",
"extract": "An avatar (Sanskrit: \u0905\u0935\u0924\u093e\u0930, IAST: avat\u0101ra), a concept in Hinduism that means \"descent\", refers to the material appearance or incarnation of a deity on earth. The relative verb to \"alight, to make one's appearance\" is sometimes used to refer to any guru or revered human being.\nThe word avatar does not appear in the Vedic literature, but appears in verb forms in post-Vedic literature, and as a noun particularly in the Puranic literature after the 6th century CE. Despite that, the concept of an avatar is compatible with the content of the Vedic literature like the Upanishads as it is symbolic imagery of the Saguna Brahman concept in the philosophy of Hinduism. The Rigveda describes Indra as endowed with a mysterious power of assuming any form at will. The Bhagavad Gita expounds the doctrine of Avatara but with terms other than avatar.\nTheologically, the term is most often associated with the Hindu god Vishnu, though the idea has been applied to other deities. Varying lists of avatars of Vishnu appear in Hindu scriptures, including the ten Dashavatara of the Garuda Purana and the twenty-two avatars in the Bhagavata Purana, though the latter adds that the incarnations of Vishnu are innumerable. The avatars of Vishnu are important in Vaishnavism theology. In the goddess-based Shaktism tradition of Hinduism, avatars of the Devi in different appearances such as Tripura Sundari, Durga and Kali are commonly found. While avatars of other deities such as Ganesha and Shiva are also mentioned in medieval Hindu texts, this is minor and occasional. The incarnation doctrine is one of the important differences between Vaishnavism and Shaivism traditions of Hinduism.\nIncarnation concepts similar to avatar are also found in Buddhism, Christianity and others. The scriptures of Sikhism include the names of numerous Hindu gods and goddesses, but it rejected the doctrine of savior incarnation and endorsed the view of Hindu Bhakti movement saints such as Namdev that formless eternal god is within the human heart and man is his own savior."
}
}
}
}
...and it's so close to perfect that I can't believe it... the only problem is there's no way to restrict the search to the category "English-language films".
PROBLEM: generator=categorymember won't show page images and doesn't filter by prefix, only sets start point by prefix
Here's a search using generator=categorymember:
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageimages&generator=categorymembers&piprop=name%7Cthumbnail%7Coriginal&pithumbsize=100&gcmtitle=Category%3AEnglish-language%20films&gcmprop=&gcmtype=page&gcmlimit=10&gcmstartsortkeyprefix=Avatar
Here's the first ten results (I've left the prop=extract and prop=info parameters off--they're working fine--so you can see just the relevant detail):
{
"batchcomplete": "",
"continue": {
"gcmcontinue": "page|2953314335314b4d04354b394141011201dcbedc08|47013432",
"continue": "gcmcontinue||"
},
"query": {
"pages": {
"4273140": {
"pageid": 4273140,
"ns": 0,
"title": "Avatar (2009 film)"
},
"15945267": {
"pageid": 15945267,
"ns": 0,
"title": "Avatar (2004 film)"
},
"25813358": {
"pageid": 25813358,
"ns": 0,
"title": "Avatar 2"
},
"27442998": {
"pageid": 27442998,
"ns": 0,
"title": "Avatar 3"
},
"50071841": {
"pageid": 50071841,
"ns": 0,
"title": "Ave Maria (1918 film)"
},
"41748079": {
"pageid": 41748079,
"ns": 0,
"title": "Avenged (2013 U.S. film)"
},
"42739169": {
"pageid": 42739169,
"ns": 0,
"title": "Avenger (film)"
},
"50726142": {
"pageid": 50726142,
"ns": 0,
"title": "The Avenger (1931 film)"
},
"43707905": {
"pageid": 43707905,
"ns": 0,
"title": "The Avengers (1950 film)"
},
"22114132": {
"pageid": 22114132,
"ns": 0,
"title": "The Avengers (2012 film)"
}
}
}
}
And as you can see:
The pageimages data is nowhere to be seen
The setting gcmstartsortkeyprefix=Avatar only sets the point from which the listing starts, it doesn't actually filter out results that don't have that prefix. There's no analogue to the gapprefix parameter that's available when using generator-allpages.
PROBLEM: list=search won't show any prop values and searches by content as well as title
Here's a search using list=search, an avenue of approach I'm grateful to this page for:
https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&prop=info%7Cpageimages%7Cextracts&list=search&inprop=url%7Cdisplaytitle&piprop=name%7Cthumbnail%7Coriginal&pithumbsize=100&exintro=1&explaintext=1&srsearch=Avatar+incategory:English-language_films&srlimit=3
And here's the return (just the first 3 results):
{
"batchcomplete": "",
"continue": {
"sroffset": 3,
"continue": "-||info|pageimages|extracts"
},
"query": {
"searchinfo": {
"totalhits": 224
},
"search": [
{
"ns": 0,
"title": "Avatar (2009 film)",
"pageid": 4273140,
"size": 201954,
"wordcount": 18643,
"snippet": "<span class=\"searchmatch\">Avatar</span>, marketed as James Cameron's <span class=\"searchmatch\">Avatar</span>, is a 2009 American epic science fiction film directed, written, produced, and co-edited by James Cameron, and",
"timestamp": "2018-05-01T01:52:00Z"
},
{
"ns": 0,
"title": "Avatar 2",
"pageid": 25813358,
"size": 55754,
"wordcount": 5380,
"snippet": "<span class=\"searchmatch\">Avatar</span> 2 is an upcoming American epic science fiction film directed, produced, and co-written by James Cameron, and the first of four planned sequels to",
"timestamp": "2018-05-02T10:02:34Z"
},
{
"ns": 0,
"title": "Avatar 3",
"pageid": 27442998,
"size": 17747,
"wordcount": 1374,
"snippet": "<span class=\"searchmatch\">Avatar</span> 3 is an upcoming 2021 American epic science fiction film directed, produced, co-written, and co-edited by James Cameron, scheduled to be released",
"timestamp": "2018-05-02T10:02:45Z"
}
]
}
}
Now this search does define prop=info and prop=images and prop=extracts, but they're nowhere to be seen. Plus there's also no analogue to gapprefix with this approach.
PROBLEM: generator=search won't show pageimages and searches by content as well as title
Here's the same search as above, but using generator=search:
https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&prop=info%7Cpageimages%7Cextracts&generator=search&inprop=url%7Cdisplaytitle&piprop=name%7Cthumbnail%7Coriginal&pithumbsize=100&exintro=1&explaintext=1&gsrsearch=Avatar+incategory:English-language_films&gsrlimit=3
And here's the result:
{
"batchcomplete": "",
"continue": {
"gsroffset": 3,
"continue": "gsroffset||"
},
"query": {
"pages": {
"4273140": {
"pageid": 4273140,
"ns": 0,
"title": "Avatar (2009 film)",
"index": 1,
"contentmodel": "wikitext",
"pagelanguage": "en",
"pagelanguagehtmlcode": "en",
"pagelanguagedir": "ltr",
"touched": "2018-05-01T01:52:00Z",
"lastrevid": 839068297,
"length": 201954,
"fullurl": "https://en.wikipedia.org/wiki/Avatar_(2009_film)",
"editurl": "https://en.wikipedia.org/w/index.php?title=Avatar_(2009_film)&action=edit",
"canonicalurl": "https://en.wikipedia.org/wiki/Avatar_(2009_film)",
"displaytitle": "<i>Avatar</i> (2009 film)",
"extract": "Avatar, marketed as James Cameron's Avatar, is a 2009 American epic science fiction film directed, written, produced, and co-edited by James Cameron, and starring Sam Worthington, Zoe Saldana, Stephen Lang, Michelle Rodriguez, and Sigourney Weaver. The film is set in the mid-22nd century, when humans are colonizing Pandora, a lush habitable moon of a gas giant in the Alpha Centauri star system, in order to mine the mineral unobtanium, a room-temperature superconductor. The expansion of the mining colony threatens the continued existence of a local tribe of Na'vi \u2013 a humanoid species indigenous to Pandora. The film's title refers to a genetically engineered Na'vi body with the mind of a remotely located human that is used to interact with the natives of Pandora.\nDevelopment of Avatar began in 1994, when Cameron wrote an 80-page treatment for the film. Filming was supposed to take place after the completion of Cameron's 1997 film Titanic, for a planned release in 1999, but, according to Cameron, the necessary technology was not yet available to achieve his vision of the film. Work on the language of the film's extraterrestrial beings began in 2005, and Cameron began developing the screenplay and fictional universe in early 2006. Avatar was officially budgeted at $237 million. Other estimates put the cost between $280 million and $310 million for production and at $150 million for promotion. The film made extensive use of new motion capture filming techniques, and was released for traditional viewing, 3D viewing (using the RealD 3D, Dolby 3D, XpanD 3D, and IMAX 3D formats), and for \"4D\" experiences in select South Korean theaters. The stereoscopic filmmaking was touted as a breakthrough in cinematic technology.\nAvatar premiered in London on December 10, 2009, and was internationally released on December 16 and in the United States and Canada on December 18, to positive critical reviews, with critics highly praising its groundbreaking visual effects. During its theatrical run, the film broke several box office records and became the highest-grossing film of all time, as well as in the United States and Canada, surpassing Cameron's Titanic, which had held those records for twelve years. It also became the first film to gross more than $2 billion and the best-selling film of 2010 in the United States. Avatar was nominated for nine Academy Awards, including Best Picture and Best Director, and won three, for Best Art Direction, Best Cinematography and Best Visual Effects. Following the film's success, Cameron signed with 20th Century Fox to produce four sequels: Avatar 2 and Avatar 3 are currently filming, and will be released on December 18, 2020, and December 17, 2021 respectively; subsequent sequels will start shooting as soon as they wrap filming, and will be released in 2024 and 2025. Several cast members are expected to return, including Worthington, Saldana, Lang, and Weaver."
},
"25813358": {
"pageid": 25813358,
"ns": 0,
"title": "Avatar 2",
"index": 2,
"contentmodel": "wikitext",
"pagelanguage": "en",
"pagelanguagehtmlcode": "en",
"pagelanguagedir": "ltr",
"touched": "2018-05-02T10:02:34Z",
"lastrevid": 839266311,
"length": 55754,
"fullurl": "https://en.wikipedia.org/wiki/Avatar_2",
"editurl": "https://en.wikipedia.org/w/index.php?title=Avatar_2&action=edit",
"canonicalurl": "https://en.wikipedia.org/wiki/Avatar_2",
"displaytitle": "<i>Avatar 2</i>",
"extract": "Avatar 2 is an upcoming American epic science fiction film directed, produced, and co-written by James Cameron, and the first of four planned sequels to his film Avatar (2009). Cameron is producing the film with Jon Landau, with Josh Friedman originally announced as his co-writer; it was later announced that Cameron, Friedman, Rick Jaffa, Amanda Silver, and Shane Salerno took a part in the writing process of all sequels before being attributed separate scripts, making the eventual writing credits unclear. Cast members Sam Worthington, Zoe Saldana, Stephen Lang, Sigourney Weaver, Giovanni Ribisi, Joel David Moore, Dileep Rao, C. C. H. Pounder, and Matt Gerald are all expected to return.\nCameron, who had stated in 2006 that he would like to make sequels to Avatar if it were successful, announced the first two in 2010 following the widespread success of the first film, with Avatar 2 aiming for a 2014 release. However, the subsequent addition of two more sequels, and the necessity to develop new technology in order to film performance capture scenes underwater, a feature never accomplished before in motion capture history, led to significant delays to allow the crew more time to work on the writing, pre-production, and visual effects; it is currently planned for a release on December 18, 2020, exactly eleven years after the American release of the first film, with the following sequels to be released between 2021 and 2025.\nPreliminary shooting for the film started in Manhattan Beach, California on August 15, 2017, followed by principal photography simultaneously with Avatar 3 in New Zealand on September 25, 2017. The other sequels are expected to start shooting as soon as Avatar 2 and 3's filming wraps."
},
"27442998": {
"pageid": 27442998,
"ns": 0,
"title": "Avatar 3",
"index": 3,
"contentmodel": "wikitext",
"pagelanguage": "en",
"pagelanguagehtmlcode": "en",
"pagelanguagedir": "ltr",
"touched": "2018-05-02T10:02:45Z",
"lastrevid": 839266333,
"length": 17747,
"fullurl": "https://en.wikipedia.org/wiki/Avatar_3",
"editurl": "https://en.wikipedia.org/w/index.php?title=Avatar_3&action=edit",
"canonicalurl": "https://en.wikipedia.org/wiki/Avatar_3",
"displaytitle": "<i>Avatar 3</i>",
"extract": "Avatar 3 is an upcoming 2021 American epic science fiction film directed, produced, co-written, and co-edited by James Cameron, scheduled to be released on December 17, 2021. It is the second of four planned sequels to his film Avatar (2009), and will be a follow-up to Avatar 2 (2020). Cameron is producing the film with Jon Landau, with Rick Jaffa and Amanda Silver originally announced as his co-writers; it was later announced that Cameron, Jaffa, Silver, Josh Friedman and Shane Salerno took a part in the writing process of all of the sequels before being assigned to finish the separate scripts, making the eventual writing credits unclear. Cast members Sam Worthington, Zoe Saldana, Stephen Lang, Sigourney Weaver, Joel David Moore, CCH Pounder and Matt Gerald are all expected to return from the first two films.\nAvatar 3 started shooting simultaneously with Avatar 2 on August 15, 2017. Two additional sequels will start shooting as soon as the first two wrap post-production, and are expected to be released in 2024 and 2025 respectively."
}
}
}
}
...and again this one is so close to perfect, because in this version it does return the prop=info and prop=extracts results, but again it ignores prop=pageimages and I can't find any way to restrict the search to the starting letters of the titles.
Conclusion: SNAFU
...Is there any 'one search to rule them all' here? It's so tantalizing that I can almost, almost, almost get everything with one query, but in the end I can't figure out how to get 'em all. Can anyone help me through the thicket?

On a high level, list modules generate some list of pages (e.g. the pages most recently edited). Sometimes they include some additional information but mostly they are just meant to give you a list of pages. Prop modules operate on a list of pages and add some kind of extra information to each; the list can be determined by the client (via parameters like titles or pageids) or a list module (used as generator= instead of list=). You can use list= and prop= "together" but all that does is create two separate lists (one of which will be empty and not shown because there is no titles or generator parameter).
Your generator queries are fine; you'll need pilicense=any if you want non-free images in the output. Maybe you found some outdated documentation that does not mention that?
You can use intitle:... in the search term for title search (see docs).

Get request to Google Search

I'm trying to get HTML with search results from Google. With sending GET request for example to:
https://www.google.ru/?q=1111
But if in browser all is ok, when I'm trying to use it with curl or to get source with "View source" in Google, there is only some Javascript code, no search result. Is that some type of protection? What can I do?

You now have to use the Google Search API to make your GET requests.
All other methods have been blocked.

The page from your question is the Google Search page with the input field.
The search results page is this one:
https://www.google.ru/search?q=1111
Rotate proxies and user agents, and delay similar requests to get the HTML from Google Search results pages with fewer amount of bans.
Or use SerpApi to access HTML and the extracted data from it. It has a free trial.
curl -s 'https://serpapi.com/search?q=coffee'
Output
{
// Omitted
"organic_results": [
{
"position": 1,
"title": "Coffee - Wikipedia",
"link": "https://en.wikipedia.org/wiki/Coffee",
"displayed_link": "en.wikipedia.org › wiki › Coffee",
"snippet": "Coffee is a brewed drink prepared from roasted coffee beans, the seeds of berries from certain Coffea species. When coffee berries turn from green to bright red ...",
"sitelinks": {
"expanded": [
{
"title": "History",
"link": "https://en.wikipedia.org/wiki/History_of_coffee",
"snippet": "The history of coffee dates back to the 15th century, and possibly ..."
},
{
"title": "International Coffee Day",
"link": "https://en.wikipedia.org/wiki/International_Coffee_Day",
"snippet": "International Coffee Day (1 October) is an occasion that is ..."
},
{
"title": "List of coffee drinks",
"link": "https://en.wikipedia.org/wiki/List_of_coffee_drinks",
"snippet": "Milk coffee - Nitro cold brew coffee - List of coffee dishes - ..."
},
{
"title": "Portal:Coffee",
"link": "https://en.wikipedia.org/wiki/Portal:Coffee",
"snippet": "Coffee is a brewed drink prepared from roasted coffee beans, the ..."
},
{
"title": "Coffee bean",
"link": "https://en.wikipedia.org/wiki/Coffee_bean",
"snippet": "A coffee bean is a seed of the Coffea plant and the source for ..."
},
{
"title": "Geisha",
"link": "https://en.wikipedia.org/wiki/Geisha_(coffee)",
"snippet": "Geisha coffee, sometimes referred to as Gesha coffee, is a type of ..."
}
],
"list": [
{
"date": "Color‎: ‎Black, dark brown, light brown, beige"
}
]
},
"rich_snippet": {
"bottom": {
"detected_extensions": {
"introduced_th_century": 15
},
"extensions": [
"Introduced‎: ‎15th century",
"Color‎: ‎Black, dark brown, light brown, beige"
]
}
},
"cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:U6oJMnF-eeUJ:https://en.wikipedia.org/wiki/Coffee+&cd=2&hl=sv&ct=clnk&gl=se",
"related_pages_link": "https://www.google.se/search?gl=se&hl=sv&q=related:https://en.wikipedia.org/wiki/Coffee+coffee&sa=X&ved=2ahUKEwjJ9p2p_KXuAhVlRN8KHf22D8wQHzABegQIAhAJ"
}
},
// ...
}
Disclaimer: I work at SerpApi.

To add a bit more sauce to the answers as they are not correct and do not even respond to your problem.
First of all, it's perfectly legal to scrape Google as long as you do not harm their service through it (DoS-like).
Also the methods have not been blocked, it's just not that simple.
The speed depends on your methods, it does not have to be very slow..
You can scrape ten thousands of keyword pages in a minute if needed.
You will find a better answer to the topic here: Is it ok to scrape data from Google results?
Your problem with curl comes indeed from protection, Google does not allow automated access and it has a very sophisticated set of detection algorithms.
They go from simple user agent checks (that's what stopped you directly) up to artificial intelligence that tries to detect unusual queries or related queries.

You can load it in the browser and then scrape results via Javascript.
Or you can use Google API, but seems that it requires payment if you will request it more then 100 times per day.

How to get API.AI simply send me the JSON data of the conversation?

I am trying to understand if there is an option to get the conversation logs of the discussions with some sort of a webhook.
The API.AI docs only refer to using webhook for fulfilment purposes , but for now I don't plan my server (GCP ENGINE APP) to supply fulfilment but only to log the relevant parameters from each conversation.
Anyone knows how to approach this?

Turn on the webhook feature for the intent. You will be able to get the requests and all the data associated with it. You will be able to send back to API.AI too. Here is the full circle:
{
"id": "891db09a-851c-43dc-81c6-4c6705c94f85",
"timestamp": "2017-01-03T10:31:18.676Z",
"result": {
"source": "agent",
"resolvedQuery": "yes, France",
"action": "show.news",
"actionIncomplete": false,
"parameters": {
"adjective": "",
"subject": "France"
},
"contexts": [
{
"name": "subject",
"parameters": {
"subject.original": "France",
"adjective": "",
"subject": "France",
"adjective.original": ""
},
"lifespan": 5
},
{
"name": "region",
"parameters": {
"subject.original": "France",
"adjective": "",
"subject": "France",
"adjective.original": ""
},
"lifespan": 5
}
],
"metadata": {
"intentId": "34773849-4ac2-4e28-95a5-7abfc061044e",
"webhookUsed": "true",
"webhookForSlotFillingUsed": "false",
"intentName": "subject"
},
"fulfillment": {
"speech": "Here is the latest news\n\n According to Watson the main emotion expressed in the article is: ;( ( sadness )\n\n Son of Equatorial Guinea’s president facing trial in France\n\nPARIS — After years of investigation, France on Monday put the son of the president of Equatorial Guinea on trial for corruption, charged with spending many millions in state funds — much of it allegedly in cash — to feed an opulent lifestyle of fast cars, designer clothes, works of art and...\n\nRead more: https://www.washingtonpost.com/world/europe/son-of-equatorial-guineas-president-facing-trial-in-france/2017/01/02/b03d30d0-d0cb-11e6-9651-54a0154cf5b3_story.html",
"source": "Washington Post",
"displayText": "Here is the latest news. According to Watson the main emotion expressed in the article is: sadness",
"messages": [
{
"type": 0,
"speech": "Here is the latest news\n\n According to Watson the main emotion expressed in the article is: ;( ( sadness )\n\n Son of Equatorial Guinea’s president facing trial in France\n\nPARIS — After years of investigation, France on Monday put the son of the president of Equatorial Guinea on trial for corruption, charged with spending many millions in state funds — much of it allegedly in cash — to feed an opulent lifestyle of fast cars, designer clothes, works of art and...\n\nRead more: https://www.washingtonpost.com/world/europe/son-of-equatorial-guineas-president-facing-trial-in-france/2017/01/02/b03d30d0-d0cb-11e6-9651-54a0154cf5b3_story.html"
}
],
"data": {
"newsAgent": {
"adjective": "",
"subject": "France",
"intent": "subject",
"action": "show.news",
"news": {
"title": "Son of Equatorial Guinea’s president facing trial in France",
"source": "Washington Post",
"link": "https://www.washingtonpost.com/world/europe/son-of-equatorial-guineas-president-facing-trial-in-france/2017/01/02/b03d30d0-d0cb-11e6-9651-54a0154cf5b3_story.html",
"language": "english",
"body": "PARIS — After years of investigation, France on Monday put the son of the president of Equatorial Guinea on trial for corruption, charged with spending many millions in state funds — much of it allegedly in cash — to feed an opulent lifestyle of fast cars, designer clothes, works of art and...",
"emotion": "sadness",
"emoticon": ";("
},
"speech": "Here is the latest news",
"sessionId": "0856125a-d0bc-4cba-990d-cbcfaea536db"
}
}
},
"score": 1
},
"status": {
"code": 206,
"errorType": "partial_content",
"errorDetails": "Webhook call failed. Error message: Webhook contains contexts with empty names or names containing whitespaces. ErrorId: 131000fa-0ec1-4efb-b47c-64301ac7bb2b"
},
"sessionId": "0856125a-d0bc-4cba-990d-cbcfaea536db"
}
The result object is the request that API.AI sends you, you get the contexts objects as well.
The fulfilment object is the response my endpoint sent back to API.AI
Check the documentation

Repeating over nested Objects with Angular JS

I've been playing with the google feed API for a podcasts I run and wanted to include a simple ng-repeat to display the title and link URL to the MP3. However the JSON google provides is nested in several different Objects and Arrays. For instance, my JSON feed looks like this:
{
"responseData": {
"feed": {
"feedUrl": "http://feeds.feedburner.com/stillgotgame",
"title": "2old2play presents Still Got Game",
"link": "http://www.2old2play.com/",
"author": "",
"description": "Still Got Game focuses on the gaming industry from the perspective of adult gamers. We look at news, reviews, and inside information in the world of video games. Each episode touches on the community, the industry, and the games that keep us coming back.",
"type": "rss20",
"entries": [
{
"mediaGroups": [
{
"contents": [
{
"url": "http://traffic.libsyn.com/dsmooth/Still_Got_Game_Episode_33__Coast_to_Coast.mp3",
"fileSize": "35346436",
"type": "audio/mpeg"
}
]
}
],
"title": "Still Got Game Ep. 33: Coast to Coast",
"link": "http://2old2play.com/media/still-got-game-ep-33-coast-coast/",
"author": "podcast#2old2play.com",
"publishedDate": "Tue, 06 May 2014 22:05:01 -0700",
"contentSnippet": "DSmooth finally has his Rocket Bro back. After a multi-week hiatus for Doodirock's move to the West Coast, they boys were back ...",
"content": "DSmooth finally has his Rocket Bro back. After a multi-week hiatus for Doodirock's move to the West Coast, they boys were back in force this week. The duo talk gaming news and the new releases, cover a bunch of viewer feedback, and talk a bit about what may be the worst moving company ever. They'll have you LMFAOing! You can always call the boys at (773) 527-2961 and weigh in yourself, or tune in live Monday nights at 9:00 EDT at http://twitch.tv/2old2play ...",
"categories": [
"Audio"
]
}
]
}
},
"responseDetails": null,
"responseStatus": 200
}
As you can see, in order to get to the items URL to the MP3 I have to go through entries, mediaGroups, and Contents before I even reach the Array I need! I start off inside the entries with this factory I've created:
.factory('audioFEED', function($resource){
return $resource('http://ajax.googleapis.com/ajax/services/feed/load?v=1.0&num=100&q=http://feeds.feedburner.com/stillgotgame',{},
{
query:{
method:'JSONP',
params: {callback: 'JSON_CALLBACK'},
isArray:false,
headers:{
'Access-Control-Allow-Origin': '*'
}
},
});
});
Thats easy enough with just setting up the data on the controller here:
'use strict';
angular.module('twitchappApp')
.controller('audioCtrl', function($scope, audioFEED) {
audioFEED.query(function(data){
$scope.audios = data.responseData.feed.entries;
console.log($scope.audios);
});
});
However, In order to get to that data I'm having to set up multiple ng-repeats with on inside of the next. I would really like to find a better way to handle this data within the controller and access the URL inside one ng-repeat. It seems this way is adding more over head and probably not the best over all method. Is there a best practice for this? My current end result looks like this:
<h1>Audio</h1>
<div ng-repeat="audio in audios">
<h3>{{ audio.title }}</h3>
<p>{{audio.contentSnippet}}</p>
<div ng-repeat="play in audio.mediaGroups">
<div ng-repeat="playurl in play.contents">
PLAY
</div>
</div>
</div>
Yuk...

Check out this JSFiddle. Uses underscore to flatten your data down to an easier to work with array. http://jsfiddle.net/ahchurch/sKeY9/3/
Template
<div ng-controller="MyCtrl">
<div ng-repeat="playurl in contents">
PLAY
</div>
</div>
JavaScript
var myApp = angular.module('myApp',[]);
//myApp.directive('myDirective', function() {});
//myApp.factory('myService', function() {});
function MyCtrl($scope) {
var responseData = {
"responseData": {
"feed": {
"feedUrl": "http://feeds.feedburner.com/stillgotgame",
"title": "2old2play presents Still Got Game",
"link": "http://www.2old2play.com/",
"author": "",
"description": "Still Got Game focuses on the gaming industry from the perspective of adult gamers. We look at news, reviews, and inside information in the world of video games. Each episode touches on the community, the industry, and the games that keep us coming back.",
"type": "rss20",
"entries": [
{
"mediaGroups": [
{
"contents": [
{
"url": "http://traffic.libsyn.com/dsmooth/Still_Got_Game_Episode_33__Coast_to_Coast.mp3",
"fileSize": "35346436",
"type": "audio/mpeg"
}
]
}
],
"title": "Still Got Game Ep. 33: Coast to Coast",
"link": "http://2old2play.com/media/still-got-game-ep-33-coast-coast/",
"author": "podcast#2old2play.com",
"publishedDate": "Tue, 06 May 2014 22:05:01 -0700",
"contentSnippet": "DSmooth finally has his Rocket Bro back. After a multi-week hiatus for Doodirock's move to the West Coast, they boys were back ...",
"content": "DSmooth finally has his Rocket Bro back. After a multi-week hiatus for Doodirock's move to the West Coast, they boys were back in force this week. The duo talk gaming news and the new releases, cover a bunch of viewer feedback, and talk a bit about what may be the worst moving company ever. They'll have you LMFAOing! You can always call the boys at (773) 527-2961 and weigh in yourself, or tune in live Monday nights at 9:00 EDT at http://twitch.tv/2old2play ...",
"categories": [
"Audio"
]
}
]
}
},
"responseDetails": null,
"responseStatus": 200
};
//Underscore:
$scope.contents = _.flatten(_.map(responseData.responseData.feed.entries, function(entry){
return _.map(entry.mediaGroups, function(mediaGroup){
return mediaGroup.contents;
});
}));
$scope.name = 'Superhero';
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Google Feed API - not returning feed URLs - json

Related

Deserializing Nested JSON API response with Django

Wikipedia API cloud of confusion: `list` vs `generator` vs `search`

Get request to Google Search

How to get API.AI simply send me the JSON data of the conversation?

Repeating over nested Objects with Angular JS

Categories

Resources