currently I'm trying to automatically scrape/download yahoo finance historical data. I plan to download the data using the download link provided in the website.
My code is to list all the available link and work it from there, the problem is that the exact link doesn't appear in the result. Here is my code(partial):
def scrape_page(url, header):
page = requests.get(url, headers=header)
if page.status_code == 200:
soup = bs.BeautifulSoup(page.content, 'html.parser')
return soup
return null
if __name__ == '__main__':
symbol = 'GOOGL'
dt_start = datetime.today() - timedelta(days=(365*5+1))
dt_end = datetime.today()
start = format_date(dt_start)
end = format_date(dt_end)
sub = subdomain(symbol, start, end)
header = header_function(sub)
base_url = 'https://finance.yahoo.com'
url = base_url + sub
soup = scrape_page(url, header)
result = soup.find_all('a')
for a in result:
print('URL :',a['href'])
UPDATE 10/9/2020 :
I managed to find the span which is the parent for the link with this code
spans = soup.find_all('span',{"class":"Fl(end) Pos(r) T(-6px)"})
However, when I print it out, it does not show the link, here is the output:
>>> spans
[<span class="Fl(end) Pos(r) T(-6px)" data-reactid="31"></span>]
To download the historical data in CSV format from Yahoo Finance, you can use this example:
import requests
from datetime import datetime
csv_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={from_}&period2={to_}&interval=1d&events=history'
quote = 'GOOGL'
from_ = str(datetime.timestamp(datetime(2019,9,27,0,0))).split('.')[0]
to_ = str(datetime.timestamp(datetime(2020,9,27,23,59))).split('.')[0]
print(requests.get(csv_link.format(quote=quote, from_=from_, to_=to_)).text)
Prints:
Date,Open,High,Low,Close,Adj Close,Volume
2019-09-27,1242.829956,1244.989990,1215.199951,1225.949951,1225.949951,1706100
2019-09-30,1220.599976,1227.410034,1213.420044,1221.140015,1221.140015,1223500
2019-10-01,1222.489990,1232.859985,1205.550049,1206.000000,1206.000000,1225200
2019-10-02,1196.500000,1198.760010,1172.630005,1177.920044,1177.920044,1651500
2019-10-03,1183.339966,1191.000000,1163.140015,1189.430054,1189.430054,1418400
2019-10-04,1194.290039,1212.459961,1190.969971,1210.959961,1210.959961,1214100
2019-10-07,1207.000000,1218.910034,1204.359985,1208.250000,1208.250000,852000
2019-10-08,1198.770020,1206.869995,1189.479980,1190.130005,1190.130005,1004300
2019-10-09,1201.329956,1208.459961,1198.119995,1202.400024,1202.400024,797400
2019-10-10,1198.599976,1215.619995,1197.859985,1209.469971,1209.469971,642100
2019-10-11,1224.030029,1228.750000,1213.640015,1215.709961,1215.709961,1116500
2019-10-14,1213.890015,1225.880005,1211.880005,1217.770020,1217.770020,664800
2019-10-15,1221.500000,1247.130005,1220.920044,1242.239990,1242.239990,1379200
2019-10-16,1241.810059,1254.189941,1238.530029,1243.000000,1243.000000,1149300
2019-10-17,1251.400024,1263.750000,1249.869995,1252.800049,1252.800049,1047900
2019-10-18,1254.689941,1258.109985,1240.140015,1244.410034,1244.410034,1581200
2019-10-21,1248.699951,1253.510010,1239.989990,1244.280029,1244.280029,904700
2019-10-22,1244.479980,1248.729980,1239.849976,1241.199951,1241.199951,1143100
2019-10-23,1240.209961,1258.040039,1240.209961,1257.630005,1257.630005,1064100
2019-10-24,1259.109985,1262.900024,1252.349976,1259.109985,1259.109985,1011200
...and so on.
I figured it out. That link is generated by javascript and requests.get() method won't work on dynamic content. I switched to selenium to download that link.
As of right now I'm attempting to make a simple music player app that streams music or video directly from a Youtube URL, and in order to do that I need the full download of the search page that's used to search for videos to stream. But I'm having some problems with the urlopen module in python 3, which is what I'm using to make the command application. It won't load the ytd-app tag on Youtube, which is what a good deal of the video and playlist references are put on when you first load the search. Anyone know what's going on, or know some type of workaround for it? Thanks!
My code so far:
BASICURL = "https://www.youtube.com/results?"
query = query.split()
ret = ""
stufffound = {}
for x in query:
ret = ret + x + "+"
ret = (ret[:len(ret)-1])
# URL BUILDER
if filtercriteria:
URL = BASICURL + "sp={0}".format(filtercriteria) + "&search_query={0}".format(ret)
else:
URL = BASICURL + "search_query={0}".format(ret)
query = urlopen(str(URL))
passdict = {}
def findvideosonpage(query,dictToAddTo):
for x in (BS(urlopen(query)).read()).findAll(attrs={'class':'yt-simple-endpoint style-scope ytd-video-renderer'})
dictToAddTo[query.index(x)] = x[href]
print(x)
return list([x for _,x in sorted(zip(dictToAddTo.values(), dictToAddTo.keys()))])
# Dictionary is meant to be converted into a list later to order the results
Below is my web scraping code for a website; it clicks a form which redirects to a page. From that page I need to extract [img] src url and export it into csv in a text form. I used the code below to extract a content from a td tag. When I run the same code it doesn't work because the td tag has no content but only a img tag. Any help will be appreciated. I am new to web-scraping. Thanks in Advance.
browser.find_element_by_css_selector(".textinput[value='APPLY']").click()
#select_finder = "//tr[contains(text(), 'NB')]//a"
select_finder = "//td[text()='NB')]/../td[2]/a"
browser.find_element_by_css_selector(".content a").click()
assert "Application Details" in browser.title
file_data = []
try:
assert "Application Details" in browser.title
enlargement = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[3]/td[2]/b").text
enlargement_answer1 = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[4]/td[2]").text
enlargement_answer2 = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[4]/td[3]").text
enlargement_text = enlargement + enlargement_answer1 + enlargement_answer2
considerations = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[2]/b").text
considerations_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[3]").text
considerations_text = considerations + considerations_answer
alteration = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[6]/b").text
alteration_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[7]").text
alteration_text = alteration + alteration_answer
units = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[5]/td[3]/b").text
units_answer = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[5]/td[4]").text
units_text = units + units_answer
occupancy = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[6]/td[3]/b").text
occupancy_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[6]/td[4]").text
occupancy_text = occupancy + occupancy_answer
coo = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[7]/td[3]/b").text
coo_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[7]/td[4]").text
coo_text = coo + coo_answer
floors = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[8]/td[3]/b").text
floors_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[8]/td[4]").text
floors_text = floors + floors_answer
except (NoSuchElementException, AssertionError) as e:
floors_text.append("No Zoning Characteristics Present")
coo_text.append("n/a")
occupancy_text.append("n/a")
units_text.append("n/a")
alteration_text.append("n/a")
considerations_text.append("n/a")
enlargement_text.append("n/a")
with open('DOB.csv', 'a') as f:
wr = csv.writer(f, dialect='excel')
wr.writerow((block_number, lot_number, houseno, street, condo_text,
vacant_text, city_owned_text, file_data, floors_text, coo_text, occupancy_text, units_text, alteration_text,
considerations_text, enlargement_text ))
browser.close()
As you stated you are new to web scraping I encourage you to read up a bit: http://selenium-python.readthedocs.io/locating-elements.html
You are using XPath exclusively and in ways that are not recommended.
From the docs: "You can use XPath to either locate the element in absolute terms (not advised), or relative to an element that does have an id or name attribute."
Try using other locators to get your image.
for example: driver.find_element_by_css_selector("img[src='images/box_check.gif']")
I wanted a Twitter forwarder to Telegram.
I found this one: https://github.com/franciscod/telegram-twitter-forwarder-bot
The problem is now, that if a tweet contains a hashtag before a link, Telegram show me the link to the hashtag.
I tried different things and searched about that, but I don't know how to only receive plain text from twitter.
Also I don't get the short link t.co if the tweet is to long. It's just a long link.
for tweet in tweets:
self.logger.debug("- Got tweet: {}".format(tweet.text))
# Check if tweet contains media, else check if it contains a link to an image
extensions = ('.jpg', '.jpeg', '.png', '.gif')
pattern = '[(%s)]$' % ')('.join(extensions)
photo_url = ''
tweet_text = html.unescape(tweet.text)
if 'media' in tweet.entities:
photo_url = tweet.entities['media'][0]['media_url_https']
else:
for url_entity in tweet.entities['urls']:
expanded_url = url_entity['expanded_url']
if re.search(pattern, expanded_url):
photo_url = expanded_url
break
if photo_url:
self.logger.debug("- - Found media URL in tweet: " + photo_url)
for url_entity in tweet.entities['urls']:
expanded_url = url_entity['expanded_url']
indices = url_entity['indices']
display_url = tweet.text[indices[0]:indices[1]]
tweet_text = tweet_text.replace(display_url, expanded_url)
tw_data = {
'tw_id': tweet.id,
'text': tweet_text,
'created_at': tweet.created_at,
'twitter_user': tw_user,
'photo_url': photo_url,
}
try:
t = Tweet.get(Tweet.tw_id == tweet.id)
self.logger.warning("Got duplicated tw_id on this tweet:")
self.logger.warning(str(tw_data))
except Tweet.DoesNotExist:
tweet_rows.append(tw_data)
if len(tweet_rows) >= self.TWEET_BATCH_INSERT_COUNT:
Tweet.insert_many(tweet_rows).execute()
tweet_rows = []
Just disable markdown_twitter_hashtags() function, make it return text without replace that.
I need to save the currently-loaded source file of an audio tag. Sounds simple, but here's the catch: the source gives a random sound file on every request.
The audio tag is created, the source set, and the audio played with JavaScript, as seen here:
function createAudio() {
var audio = document.createElement('audio');
audio.setAttribute('id', 'file_audio')
audio.setAttribute('controls', 'controls');
audio.setAttribute('autoplay', 'true');
audio.setAttribute('hidden', 'true');
audio.appendChild(createSource());
return audio;
}
function createSource() {
var source = document.createElement('source');
var d = new Date();
source.setAttribute('id', 'file_audio_source')
source.setAttribute('src', 'file.wav?r=' + d.getTime());
source.setAttribute('type', 'audio/wav');
return source;
}
this.switchAudio = function() {
var d = new Date();
$svjq("#file_audio").find('audio').remove();
$svjq("#file_audio").find('source').remove();
$svjq("#file_audio").find('embed').remove();
if (Modernizr.audio.wav) {
document.getElementById("file_audio").appendChild(createAudio());
} else {
$svjq("#file_audio").append('<embed id="file_audio_embed" name="file_audio_embed" src="file.wav?r=' + d.getTime() + '" autostart="true" cache="false" type="audio/wav" hidden="true" loop="false" enablejavascript="true">');
}
};
this.playAgain = function() {
if (Modernizr.audio.wav) {
document.getElementById('file_audio').play();
} else {
document.getElementById('file_audio_embed').play();
}
};
I need be able to save the currently-loaded file in the source. However, if you access the file URL in the browser it returns a different file.
Automated processes such as Watir-WebDriver, Capybara (Capybara-Webkit), and Mechanize also return a random file. For example:
require 'capybara'
session = Capybara::Session.new(:selenium)
session.visit('url')
session.click_link 'play sound' #on every click u get a new sound
session.click_link 'play again'
#file_audio_source
e = session.find_by_id('file_audio_source')
e[:src]
#save the current open page and opens it
#session.save_and_open_page
#returns different file
session.visit(e[:src])
#returns different file
session.execute_script("window.open('"+e[:src]+"')")
require 'Mechanize'
agent = Mechanize.new{|agent| agent.ssl_version, agent.verify_mode = 'SSLv3', OpenSSL::SSL::VERIFY_NONE}
filedata = agent.get(e[:src]).content
aFile = File.new("/Users/me/Documents/test/test111.wav", 'wb')
#aFile.syswrite(filedata)
Could the file be embedded into the HTML or cached? And is there a way to get the file and save it locally?
Other options include recording from the sound device or using the mic to record the sound played, though this option is not at all ideal.
opt.1:
require 'capybara'
session = Capybara::Session.new(:selenium)
session.visit('url')
session.click_link 'Play sound'#this gets the file into the cache, then use the codes to get in out
opt.2
#execute the javascript that loads the file/creates the sound url. no playing of the sound
session.execute_script("document.getElementById('file_audio').appendChild(createSource());")
e = session.find_by_id("file_audio_source")
session.visit(e[:src])
Watir and Capybara perform great ! :)
but now the problem is to make it headless
and it seems that the headless browser doesnt act the same and the non-headless ??
A method to give headless functionality
def headless_get_file url
require 'uri'
res = #session.driver.cookies
agent = Mechanize.new {|agent| agent.ssl_version, agent.verify_mode = 'SSLv3', OpenSSL::SSL::VERIFY_NONE}
uri = URI('https://....')
res.keys.each do |i|
temp = res[i]
cookie = Mechanize::Cookie.new(i, temp.value)
cookie.domain = temp.domain
cookie.path = temp.path
agent.cookie_jar.add(uri,cookie)
end
filedata = agent.get(url).content
aFile = File.new("#{dir}/file.wav", 'wb')
aFile.syswrite(filedata)
end
Could the file be embedded into the html or cached?
yes it can ! Is it possible to use data URIs in video and audio tags?
<audio controls="controls" autobuffer="autobuffer" autoplay="autoplay">
<source src="data:audio/wav;base64,UklGRhwMAABXQVZFZm10IBAAAAABAAEAgD4AAIA+AAABAAgAZGF0Ya4LAACAgICAgICAgICAgICAgICAgICAgICAgICAf3hxeH+AfXZ1eHx6dnR5fYGFgoOKi42aloubq6GOjI2Op7ythXJ0eYF5aV1AOFFib32HmZSHhpCalIiYi4SRkZaLfnhxaWptb21qaWBea2BRYmZTVmFgWFNXVVVhaGdbYGhZbXh1gXZ1goeIlot1k6yxtKaOkaWhq7KonKCZoaCjoKWuqqmurK6ztrO7tbTAvru/vb68vbW6vLGqsLOfm5yal5KKhoyBeHt2dXBnbmljVlJWUEBBPDw9Mi4zKRwhIBYaGRQcHBURGB0XFxwhGxocJSstMjg6PTc6PUxVV1lWV2JqaXN0coCHhIyPjpOenqWppK6xu72yxMu9us7Pw83Wy9nY29ve6OPr6uvs6ezu6ejk6erm3uPj3dbT1sjBzdDFuMHAt7m1r7W6qaCupJOTkpWPgHqAd3JrbGlnY1peX1hTUk9PTFRKR0RFQkRBRUVEQkdBPjs9Pzo6NT04Njs+PTxAPzo/Ojk6PEA5PUJAQD04PkRCREZLUk1KT1BRUVdXU1VRV1tZV1xgXltcXF9hXl9eY2VmZmlna3J0b3F3eHyBfX+JgIWJiouTlZCTmpybnqSgnqyrqrO3srK2uL2/u7jAwMLFxsfEv8XLzcrIy83JzcrP0s3M0dTP0drY1dPR1dzc19za19XX2dnU1NjU0dXPzdHQy8rMysfGxMLBvLu3ta+sraeioJ2YlI+MioeFfX55cnJsaWVjXVlbVE5RTktHRUVAPDw3NC8uLyknKSIiJiUdHiEeGx4eHRwZHB8cHiAfHh8eHSEhISMoJyMnKisrLCszNy8yOTg9QEJFRUVITVFOTlJVWltaXmNfX2ZqZ21xb3R3eHqAhoeJkZKTlZmhpJ6kqKeur6yxtLW1trW4t6+us7axrbK2tLa6ury7u7u9u7vCwb+/vr7Ev7y9v8G8vby6vru4uLq+tri8ubi5t7W4uLW5uLKxs7G0tLGwt7Wvs7avr7O0tLW4trS4uLO1trW1trm1tLm0r7Kyr66wramsqaKlp52bmpeWl5KQkImEhIB8fXh3eHJrbW5mYGNcWFhUUE1LRENDQUI9ODcxLy8vMCsqLCgoKCgpKScoKCYoKygpKyssLi0sLi0uMDIwMTIuLzQ0Njg4Njc8ODlBQ0A/RUdGSU5RUVFUV1pdXWFjZGdpbG1vcXJ2eXh6fICAgIWIio2OkJGSlJWanJqbnZ2cn6Kkp6enq62srbCysrO1uLy4uL+/vL7CwMHAvb/Cvbq9vLm5uba2t7Sysq+urqyqqaalpqShoJ+enZuamZqXlZWTkpGSkpCNjpCMioqLioiHhoeGhYSGg4GDhoKDg4GBg4GBgoGBgoOChISChISChIWDg4WEgoSEgYODgYGCgYGAgICAgX99f398fX18e3p6e3t7enp7fHx4e3x6e3x7fHx9fX59fn1+fX19fH19fnx9fn19fX18fHx7fHx6fH18fXx8fHx7fH1+fXx+f319fn19fn1+gH9+f4B/fn+AgICAgH+AgICAgIGAgICAgH9+f4B+f35+fn58e3t8e3p5eXh4d3Z1dHRzcXBvb21sbmxqaWhlZmVjYmFfX2BfXV1cXFxaWVlaWVlYV1hYV1hYWVhZWFlaWllbXFpbXV5fX15fYWJhYmNiYWJhYWJjZGVmZ2hqbG1ub3Fxc3V3dnd6e3t8e3x+f3+AgICAgoGBgoKDhISFh4aHiYqKi4uMjYyOj4+QkZKUlZWXmJmbm52enqCioqSlpqeoqaqrrK2ur7CxsrGys7O0tbW2tba3t7i3uLe4t7a3t7i3tre2tba1tLSzsrKysbCvrq2sq6qop6alo6OioJ+dnJqZmJeWlJKSkI+OjoyLioiIh4WEg4GBgH9+fXt6eXh3d3V0c3JxcG9ubWxsamppaWhnZmVlZGRjYmNiYWBhYGBfYF9fXl5fXl1dXVxdXF1dXF1cXF1cXF1dXV5dXV5fXl9eX19gYGFgYWJhYmFiY2NiY2RjZGNkZWRlZGVmZmVmZmVmZ2dmZ2hnaGhnaGloZ2hpaWhpamlqaWpqa2pra2xtbGxtbm1ubm5vcG9wcXBxcnFycnN0c3N0dXV2d3d4eHh5ent6e3x9fn5/f4CAgIGCg4SEhYaGh4iIiYqLi4uMjY2Oj5CQkZGSk5OUlJWWlpeYl5iZmZqbm5ybnJ2cnZ6en56fn6ChoKChoqGio6KjpKOko6SjpKWkpaSkpKSlpKWkpaSlpKSlpKOkpKOko6KioaKhoaCfoJ+enp2dnJybmpmZmJeXlpWUk5STkZGQj4+OjYyLioqJh4eGhYSEgoKBgIB/fn59fHt7enl5eHd3dnZ1dHRzc3JycXBxcG9vbm5tbWxrbGxraWppaWhpaGdnZ2dmZ2ZlZmVmZWRlZGVkY2RjZGNkZGRkZGRkZGRkZGRjZGRkY2RjZGNkZWRlZGVmZWZmZ2ZnZ2doaWhpaWpra2xsbW5tbm9ub29wcXFycnNzdHV1dXZ2d3d4eXl6enp7fHx9fX5+f4CAgIGAgYGCgoOEhISFhoWGhoeIh4iJiImKiYqLiouLjI2MjI2OjY6Pj46PkI+QkZCRkJGQkZGSkZKRkpGSkZGRkZKRkpKRkpGSkZKRkpGSkZKRkpGSkZCRkZCRkI+Qj5CPkI+Pjo+OjY6Njo2MjYyLjIuMi4qLioqJiomJiImIh4iHh4aHhoaFhoWFhIWEg4SDg4KDgoKBgoGAgYCBgICAgICAf4CAf39+f35/fn1+fX59fHx9fH18e3x7fHt6e3p7ent6e3p5enl6enl6eXp5eXl4eXh5eHl4eXh5eHl4eXh5eHh3eHh4d3h4d3h3d3h4d3l4eHd4d3h3eHd4d3h3eHh4eXh5eHl4eHl4eXh5enl6eXp5enl6eXp5ent6ent6e3x7fHx9fH18fX19fn1+fX5/fn9+f4B/gH+Af4CAgICAgIGAgYCBgoGCgYKCgoKDgoOEg4OEg4SFhIWEhYSFhoWGhYaHhoeHhoeGh4iHiIiHiImIiImKiYqJiYqJiouKi4qLiouKi4qLiouKi4qLiouKi4qLi4qLiouKi4qLiomJiomIiYiJiImIh4iIh4iHhoeGhYWGhYaFhIWEg4OEg4KDgoOCgYKBgIGAgICAgH+Af39+f359fn18fX19fHx8e3t6e3p7enl6eXp5enl6enl5eXh5eHh5eHl4eXh5eHl4eHd5eHd3eHl4d3h3eHd4d3h3eHh4d3h4d3h3d3h5eHl4eXh5eHl5eXp5enl6eXp7ent6e3p7e3t7fHt8e3x8fHx9fH1+fX59fn9+f35/gH+AgICAgICAgYGAgYKBgoGCgoKDgoOEg4SEhIWFhIWFhoWGhYaGhoaHhoeGh4aHhoeIh4iHiIeHiIeIh4iHiIeIiIiHiIeIh4iHiIiHiIeIh4iHiIeIh4eIh4eIh4aHh4aHhoeGh4aHhoWGhYaFhoWFhIWEhYSFhIWEhISDhIOEg4OCg4OCg4KDgYKCgYKCgYCBgIGAgYCBgICAgICAgICAf4B/f4B/gH+Af35/fn9+f35/fn1+fn19fn1+fX59fn19fX19fH18fXx9fH18fXx9fH18fXx8fHt8e3x7fHt8e3x7fHt8e3x7fHt8e3x7fHt8e3x7fHt8e3x8e3x7fHt8e3x7fHx8fXx9fH18fX5+fX59fn9+f35+f35/gH+Af4B/gICAgICAgICAgICAgYCBgIGAgIGAgYGBgoGCgYKBgoGCgYKBgoGCgoKDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KDgoOCg4KCgoGCgYKBgoGCgYKBgoGCgYKBgoGCgYKBgoGCgYKBgoGCgYKBgoGCgYKBgoGBgYCBgIGAgYCBgIGAgYCBgIGAgYCBgIGAgYCBgIGAgYCAgICBgIGAgYCBgIGAgYCBgIGAgYCBgExJU1RCAAAASU5GT0lDUkQMAAAAMjAwOC0wOS0yMQAASUVORwMAAAAgAAABSVNGVBYAAABTb255IFNvdW5kIEZvcmdlIDguMAAA" />
</audio>
is there a way to get it and save it locally?
yes u can !
you just have to find the cache directory :)
http://www.digitalmediaminute.com/article/626/viewing-browser-cache-in-firefox
and then right a little code to go and fetch it this codes goes with opt.1 not opt.2
def getlatestdir(newdirs)
times = Array.new
newdirs.each_with_index do |newdir,index|
times[index] = File::mtime(newdir)
end
temp = times[0]
count = 0
times.each_with_index do |time,index|
if temp < time
temp = time
count = index
end
end
return newdirs[count]
end
def getCacheDir
#how to get the path
#in irb enter
#require 'capybara'
#session = Capybara::Session.new(:selenium)
#session.visit('https://www.google.co.za')
#--- then open a new tab and enter about:cache
#copy the disk cache device cache directory ( from /var/... to .../T/ )
path = '/var/folders/9x/51cvmc215xx6zy9vd_64sxwc0000gn/T/'
dirs = Dir.glob(path +'*/')
newdirs = Array.new
dirs.each_with_index do |dir,index|
if(dir.include? 'webdriver-profile')
newdirs[newdirs.length] = dir
end
end
the_cache_dir = getlatestdir(newdirs) + 'Cache'
return the_cache_dir
end
def saveFile
rifffile = ''
count = 0
the_cache_dir = getCacheDir
files = Dir.glob(the_cache_dir + '/*/*/*')
files.each_with_index do |file,index|
bytes = open(file, 'rb'){|io|io.read}
str = bytes[0].to_s + bytes[1].to_s + bytes[2].to_s + bytes[3].to_s
if(str == 'RIFF')
count = index
rifffile = file
break
end
end
puts rifffile
filename = 'test123.wav'
#read file bytes
bytes = File.open(rifffile, 'rb'){|io|io.read}
#write file to the directory
f = File.new(filename, 'wb')
f.syswrite(bytes)
return filename
end
granted the above code isnt the greatest or fastest, but it gets the job done
Other options include recording from the sound device, or using the mic to record the sound played
thats would take too long and too much effort :P
In summary, opt.1 is ok but not great, opt.2 is far, far better :)
ajt