Parsing data requests from google flights using google flights package - json

I'm working on interacting with the google flights api (qpx). I am using the following link and working with the following experimental package to feed in information for a request:
https://github.com/rweyant/googleflights
Below is the code I have thus far for anyone interested in replicating my results:
#call library and data-------------------------------------------------------------------
library(googleflights)
library(MUCflights) #to access airport codes
data("airports")
#codes for countries i'm interested in------------------------------------------
code_list = airports
#later interface for updating codes
my_destinations = matrix(c("San Juan", "Amsterdam", "Berlin",
"San Diego", "Lima", "Cali", "Havana"))
my_home = matrix(c("LGA", "JFK"))
#loop extract
code_list = airports
code_bucket = NULL
for (i in my_destinations) {
print(i)
drop = code_list[code_list$City == i,c("City","IATA")]
drop = as.data.frame(drop)
print(drop)
code_bucket = rbind(code_bucket, drop)
code_bucket = as.data.frame(code_bucket)
}
#clean my code bucket---------------------------------------------------------------
code_bucket = na.omit(code_bucket)
code_bucket = code_bucket[code_bucket$IATA != "",]
code_bucket
#feed in codes into function---------------------------------------------------------
#each ping to QPX will combine NYC to x
#data i want
# pricing
# times
key = "(key is here)"
set_apikey(key)
result_flights = search(my_home[1], code_bucket[2,2], "2016-11-27", "2016-11-28")
I've been looking through the package details to understand the functionality and noticed that the request comes back as a list as opposed to a JSON, which seems to be for the application of a "summarise_segment" function that isn't working for me. Here is the link to the function I'm referencing:
https://github.com/rweyant/googleflights/blob/master/R/unpack.R
I'm wondering if anyone has any luck or ideas for parsing out the request that returns? The resulting list is large and I'm reaching the limits of my knowledge on dealing with these structures. Any help in pointing me in the right direction would be appreciated!

Related

Difficulties with web scraping

I have just came to an article called The 500 Greatest Songs of All Time and thought "oh that's cool I bet they also made a Spotify/Apple music list that I can follow". Well...they don't.
So in a nutshell, I wonder if it's possible to 1) scrap the website to extract the songs and 2) then do some kind of bulk upload to Spotify to create the list.
Songs' titles and authors are structured like this in the website:
Website screenshot. I have already tried to scrap the web with the importxml() formula in google sheets but with no success.
I understand the scrapping part is easier than the other and, as I am new to programming, I would be happy to manage to partially achieve this goal. I am sure this task can be achieved easily on python.
I feel like explaining everything would go beyond the scope here, so I tried to comment the code well enough.
1. Scrape the songs
I used python3 and selenium, their website doesn't block that.
Be sure to adjust your chromedriver path, and the output path of the .txt file at the bottom if necessary. Once it's done and you have your .txt file you can close it.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
s = Service(r'/Users/main/Desktop/chromedriver')
driver = webdriver.Chrome(service=s)
# just setting some vars, I used Xpath because I know that
top_500 = 'https://www.rollingstone.com/music/music-lists/best-songs-of-all-time-1224767/'
cookie_button_xpath = "// button [#id = 'onetrust-accept-btn-handler']"
div_containing_links_xpath = "// div [#id = 'pmc-gallery-list-nav-bar-render'] // child :: a"
song_names_xpath = "// article [#class = 'c-gallery-vertical-album'] / child :: h2"
links = []
songs = []
driver.get(top_500)
# accept cookies, give time to load
time.sleep(3)
cookie_btn = driver.find_element(By.XPATH, cookie_button_xpath)
cookie_btn.click()
time.sleep(1)
# extracting all the links since there are only 50 songs per page
links_to_next_pages = driver.find_elements(By.XPATH, div_containing_links_xpath)
for element in links_to_next_pages:
l = element.get_attribute('href')
links.append(l)
# extracting the songs, then going to next page and so on until we hit 500
counter = 1 # were starting with 1 here since links[0] is the current page we are already on
while True:
list = driver.find_elements(By.XPATH, song_names_xpath)
for element in list:
s = element.text
songs.append(s)
if len(songs) == 500:
break
driver.get(links[counter])
counter += 1
time.sleep(2)
# verify that there are no duplicates, if there were, something would be off
if len(songs) != len( set(songs) ):
print('you f***** up')
else:
print('seems fine')
with open('/Users/main/Desktop/output_songs.txt', 'w') as file:
file.writelines(line + '\n' for line in songs)
2. Prepare Spotify
Go to the Spotify Developer Dashboard and create an
account (use your Spotify acc).
Then create an app, call it whatever you want.
On your app click settings and whitelist http://localhost:8888/callback
On your app click "users and access" and add your Spotify account
Leave the tab open, we'll come back to it
3. Prepare Your Environment
You need Node.js so make sure that is installed on your machine
Download this from Spotifys GitHub
Unzip it, cd into the folder and run npm install
Go into the authorization_code folder and open app.js in a editor
Find var scope and append ' playlist-modify-public' to the string, this is so that your app can access you Spotify playlists, see here
Now go back to the app in your Spotify Developer Dashboard we'll need to copy the Client ID and the Client Secret into the var client_id and var client_secret respectively (in the app.js file). var redirect_uri will be
http://localhost:8888/callback - don't forget to save your changes.
4. Run the Spotify side of things
cd into the authorization_code folder and run app.js with node app.js (this is basically a server running on your PC)
Now if that works leave it running and go to http://localhost:8888, authorise your Spotify account there
There copy the full token, including the overflow, use inspect element to get it
Adjust the user_id and auth variables as well as the path to the output_songs.txt (at with open) in the following python script and run that, songs which are not found will be printed out at the end, give it a search with Google. They are usually on Spotify as well but Google seem to have the better search algorithm (surprised Pikachu face).
import requests
import re
import json
# this is NOT you display name, it's your user name!!
user_id = 'YOUR_USERNAME'
# paste your auth token from spotify; it can time out then you have to get a new one, so dont panic if you get a bunch of responses in the 400s after some time
auth = {"Authorization": "Bearer YOUR_AUTH_KEY_FROM_LOCALHOST"}
playlist = []
err_log = []
base_url = 'https://api.spotify.com/v1'
search_method = '/search'
with open('/Users/main/Desktop/output_songs.txt', 'r') as file:
songs = file.readlines()
# this querys spotify does some magic and then appends the tracks spotify uri to an array
def query_song_uris():
for n, entry in enumerate(songs):
x = re.findall(r"'([^']*)'", entry)
title_len = len(entry) - len(x[0]) - 4
title = x[0]
artist = entry[:title_len]
payload = {
'q': (entry),
'track:': (title),
'artist:': (artist),
'type': 'track',
'limit': 1
}
url = base_url + search_method
try:
r = requests.get(url, params=payload, headers=auth)
print('\nquerying spotify; ', r)
c = r.content.decode('UTF-8')
dic = json.loads(c)
track_uri = dic["tracks"]["items"][0]["uri"]
playlist.append(track_uri)
print(track_uri)
except:
err = f'\nNr. {(len(songs)-n)}: ' + f'{entry}'
err_log.append(err)
playlist.reverse()
query_song_uris()
# creates a playlist and returns playlist id
def create_playlist():
payload = {
"name": "Rolling Stone: Top 500 (All Time)",
"description": "music for old men xD with occasional hip hop appearences. just kidding"
}
url = base_url + f'/users/{user_id}/playlists'
r = requests.post(url, headers=auth, json=payload)
c = r.content.decode('UTF-8')
dic = json.loads(c)
print(f'\n\ncreating playlist #{dic["id"]}; ', r)
return dic["id"]
def add_to_playlist():
playlist_id = create_playlist()
while True:
if len(playlist) > 100:
p = playlist[:100]
else:
p = playlist
payload = {"uris": (p)}
url = base_url + f'/playlists/{playlist_id}/tracks'
r = requests.post(url, headers=auth, json=payload)
print(f'\nadding {len(p)} songs to playlist; ', r)
del playlist[ : len(p) ]
if len(playlist) == 0:
break
add_to_playlist()
print('\n\ncheck your spotify :)')
print("\n\n\nthese tracks didn't make it, check manually:\n")
for line in err_log:
print(line)
print('\n\n')
Done
If you don't want to run the code yourself, heres the playlist:
https://open.spotify.com/playlist/5fdLKYNFlA4XSvhEl36KXS
If you have trouble, everything from step 2 on is also described here in the Web API quick start or in general in the web API docs.
Regarding Apple Music
So Apple seems very closed up (surprise haha). What I found though is that you can query the i-Tunes store. Given response also contains a direct link to the song(s) on Apple music.
You might be able to go from there.
Get ISRC code from iTunes Search API (Apple music)
PS: undeniably regex is witchcraft, but y'all here got my back

Weird KeyError (Python)

So, I have to work with this JSON (from URL):
{'player': {'racing': 25260.154000000017, 'player': 259114.57700000296}, 'farming': {'fishing': 33783.390999999414, 'mining': 29048.60500000002, 'farming': 25334.504000000023}, 'piloting': {'piloting': 25570.18800000001, 'cargos': 3080.713000000036, 'heli': 10433.977000000004}, 'physical': {'strength': 198358.86700000675}, 'business': {'business': 50922.88500000005}, 'trucking': {'mechanic': 2724.5620000000004, 'garbage': 755.642999999997, 'trucking': 223784.99700000713, 'postop': 1411.4190000000006}, 'train': {'bus': 669.1940000000001, 'train': 1363.805999999999}, 'ems': {'fire': 25449.43400000001, 'ems': 13844.628000000012}, 'hunting': {'skill': 4179.033000000316}, 'casino': {'casino': 18545.526000000027}}
It is indeed one line. I am trying to make it so that for example, I can get racing, which is the first one you see. For this, you need go into Player first, and then you can get to Racing. How do I do this?
My current code:
def allthethings():
# Grab all the skills
geturl = ("http://server.tycoon.community:30120/status/data/" + str(setting_playerid))
print(geturl)
a = requests.get(geturl,headers={"X-Tycoon-Key":setting_apikeyTT}).json()
jsonconverted = (a["data"]["gaptitudes_v"])
print(jsonconverted)
# Convert JSON into many, many variables
Raw_RACR = jsonconverted['player.racing']
print(Raw_RACR)
I believe this is all the code that is needed.
Also, this is the error:
KeyError: 'player.racing'

trying to get properties of objects inside object properties

Sometimes JavaScript is playing with me (although the deal was that I would be playing with it...) This test code below keeps resisting so I'm looking for a little help from more clever people around here.
Answering to a recent question I tried to create a readable list of all the color IDs useable in Google Advanced Calendar API.
The request is very simple : Calendar.Colors.get()
The response is an object with a couple of properties, each one being other objects with other properties.
I can go down to the second level but the last -and most useful in this case - level returns a disturbing "undefined" (see partial log below)
And that's my question...
code with comments :
function getColorList(){
var colors = Calendar.Colors.get();
//Logger.log(JSON.stringify(colors));
for(var cat in colors){
Logger.log("category "+cat+" = "+JSON.stringify(colors[cat])+'\n\n')
}
// from there I try the "event" category
var events = colors["event"];
Logger.log('object colors["event"] = '+ JSON.stringify(events))
// then I try to get every properties in this object
for(var val in events){
Logger.log("key "+val+" = "+JSON.stringify(events[val]))
}
}
Full log is viewable here (externalized to keep this reasonably short)
Looks like (key) may be indicating a read-only definition as Sandy was eluding to.
Just make your own object from colors to loop through after converting it to string:
var json = JSON.stringify(colors["event"]);
var myObj = JSON.parse(json);
for(var val in myObj){
Logger.log("key "+ val +" = "+JSON.stringify(myObj[val]))
}

R - Vectorize a JSON call

Working with Mapquest directions API to plot thousands of routes using ggplot2 in R.
Basic code theory: Have a list of end locations and a single start location. For each end location, a call to fromJSON returns routing coordinates from Mapquest. From there, have already vectorized the assignment of coordinates (read as lists in lists) to the geom_path geom of ggplot2.
Right now, running this on a location set of ~ 1200 records takes ~ 4 minutes. Would love to get that down. Any thoughts on how to vectorize the call to fromJSON (which returns a list of lists)?
Windows 7, 64-bit, R 2.14.2
libraries: plyr, ggplot2, rjson, mapproj, XML
k = 0
start_loc = "263+NORTH+CENTER+ST.,+MESA+ARIZ."
end_loc = funder_trunc[,length(funder_trunc)]
route_urls = paste(mapquest_baseurl, "&from=", start_loc, "&to=", end_loc, "&ambiguities=ignore", sep="")
for (n in route_urls) {
route_legs = fromJSON(file = url(n))$route$legs[[1]]$maneuvers
lats = unlist(lapply(route_legs, function(x) return(x$startPoint[[2]])))
lngs = unlist(lapply(route_legs, function(x) return(x$startPoint[[1]])))
frame = data.frame(cbind(lngs, lats))
path_added = geom_path(aes(lngs, lats), data = frame)
p = p + path_added
k = k + 1
print(paste("Processed ", k, " of ", nrow(funder_trunc), " records in set.", sep=""))
}
Going out on a limb here since I don't use rjson or mapproj, but it seems like calling the server thousands of times is the real culprit. If the mapquest server doesn't have an API that allows you to send multiple requests in one go, you are in trouble. If it does, then you need to find out how to use/modify rjson and/or mapproj to call it...
As #Chase said, you might be able to call it in parallel, but the server won't like getting too many parallel requests from the same client - it might ban you. Btw, it might not even like getting thousands of serial requests in rapid succession from the same client either - but apparently your current code works so I guess it doesn't mind.

Can I load a .csv into Simplegeo Storage?

Is there a straightforward way to load a .csv file into Simplegeo Storage? I don't have great coding skills and I'm trying to get things set up so I can ask a freelancer to create some maps for my app. If someone has existing code to do this I can probably figure out how to make it work for my situation.
I just skimmed over the api. Here's a basic example in python
Assumed csv format:
layer, id, lat, lon
python
from simplegeo.models import Record, Client
lines = open('file.csv').split('\n')
client = Client('your-oauth-token', 'your-oauth-secret')
for line in lines:
parts = line.split(',')
if len(parts) == 4:
layer = parts[0].strip()
id = parts[1].strip()
lat = float(parts[2].strip())
lon = float(parts[3].strip())
r = Record(layer, id, lat, lon)
client.storage.add_record(r)
After a bit more digging, I found a python example on their site for this exact purpose
https://simplegeo.com/docs/tutorials/general-hackery#how-import-csv-file-simplegeo
import csv
import simplegeo
OAUTH_TOKEN = '[insert_oauth_token_here]'
OAUTH_SECRET = '[insert_oauth_secret_here]'
CSV_FILE = '[insert_csv_file_here]'
LAYER = '[insert_layer_name_here]'
client = simplegeo.Client(OAUTH_TOKEN, OAUTH_SECRET)
def insert(data):
layer = LAYER
id=data.pop("id")
lat=data.pop("latitude")
lon=data.pop("longitude")
# Grab more columns if you wish
record = simplegeo.Record(layer,id,lat,lon,**data)
client.add_record(record)
r = csv.DictReader(open(CSV_FILE, mode='U'))
for l in r:
insert(l)