currently I'm trying to automatically scrape/download yahoo finance historical data. I plan to download the data using the download link provided in the website.
My code is to list all the available link and work it from there, the problem is that the exact link doesn't appear in the result. Here is my code(partial):
def scrape_page(url, header):
page = requests.get(url, headers=header)
if page.status_code == 200:
soup = bs.BeautifulSoup(page.content, 'html.parser')
return soup
return null
if __name__ == '__main__':
symbol = 'GOOGL'
dt_start = datetime.today() - timedelta(days=(365*5+1))
dt_end = datetime.today()
start = format_date(dt_start)
end = format_date(dt_end)
sub = subdomain(symbol, start, end)
header = header_function(sub)
base_url = 'https://finance.yahoo.com'
url = base_url + sub
soup = scrape_page(url, header)
result = soup.find_all('a')
for a in result:
print('URL :',a['href'])
UPDATE 10/9/2020 :
I managed to find the span which is the parent for the link with this code
spans = soup.find_all('span',{"class":"Fl(end) Pos(r) T(-6px)"})
However, when I print it out, it does not show the link, here is the output:
>>> spans
[<span class="Fl(end) Pos(r) T(-6px)" data-reactid="31"></span>]
To download the historical data in CSV format from Yahoo Finance, you can use this example:
import requests
from datetime import datetime
csv_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={from_}&period2={to_}&interval=1d&events=history'
quote = 'GOOGL'
from_ = str(datetime.timestamp(datetime(2019,9,27,0,0))).split('.')[0]
to_ = str(datetime.timestamp(datetime(2020,9,27,23,59))).split('.')[0]
print(requests.get(csv_link.format(quote=quote, from_=from_, to_=to_)).text)
Prints:
Date,Open,High,Low,Close,Adj Close,Volume
2019-09-27,1242.829956,1244.989990,1215.199951,1225.949951,1225.949951,1706100
2019-09-30,1220.599976,1227.410034,1213.420044,1221.140015,1221.140015,1223500
2019-10-01,1222.489990,1232.859985,1205.550049,1206.000000,1206.000000,1225200
2019-10-02,1196.500000,1198.760010,1172.630005,1177.920044,1177.920044,1651500
2019-10-03,1183.339966,1191.000000,1163.140015,1189.430054,1189.430054,1418400
2019-10-04,1194.290039,1212.459961,1190.969971,1210.959961,1210.959961,1214100
2019-10-07,1207.000000,1218.910034,1204.359985,1208.250000,1208.250000,852000
2019-10-08,1198.770020,1206.869995,1189.479980,1190.130005,1190.130005,1004300
2019-10-09,1201.329956,1208.459961,1198.119995,1202.400024,1202.400024,797400
2019-10-10,1198.599976,1215.619995,1197.859985,1209.469971,1209.469971,642100
2019-10-11,1224.030029,1228.750000,1213.640015,1215.709961,1215.709961,1116500
2019-10-14,1213.890015,1225.880005,1211.880005,1217.770020,1217.770020,664800
2019-10-15,1221.500000,1247.130005,1220.920044,1242.239990,1242.239990,1379200
2019-10-16,1241.810059,1254.189941,1238.530029,1243.000000,1243.000000,1149300
2019-10-17,1251.400024,1263.750000,1249.869995,1252.800049,1252.800049,1047900
2019-10-18,1254.689941,1258.109985,1240.140015,1244.410034,1244.410034,1581200
2019-10-21,1248.699951,1253.510010,1239.989990,1244.280029,1244.280029,904700
2019-10-22,1244.479980,1248.729980,1239.849976,1241.199951,1241.199951,1143100
2019-10-23,1240.209961,1258.040039,1240.209961,1257.630005,1257.630005,1064100
2019-10-24,1259.109985,1262.900024,1252.349976,1259.109985,1259.109985,1011200
...and so on.
I figured it out. That link is generated by javascript and requests.get() method won't work on dynamic content. I switched to selenium to download that link.
I am trying to read xlsx file from unread mail and convert it to data frame finally it will insert into MySQL DB.To avoid duplication while inserting each row of data frame i check if the data already present in db,for this duplication i check mails one by one.
My issue is when two or more unread mail is present in inbox this duplication check fails.
detach_dir = os.path.dirname(os.path.abspath(__file__)) +
'/attachments'
user = "abc#outlook.in"
pwd = "xyz#123*"
m = imaplib.IMAP4_SSL("outlook.office365.com")
m.login(user,pwd)
# Select the mailbox
m.select("folder name in mail")
n = 0
resp, items = m.search(None, '(UNSEEN)')
items = items[0].split()
for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_bytes(email_body)
if mail.get_content_maintype() != 'multipart':
continue
att_path = os.path.join(detach_dir, filename)
if not os.path.isfile(att_path) :
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
df_mail = pd.read_excel(att_path,skiprows=
[0,2,3,4,5,6,7,8,9],skip_blank_lines=True,skipfooter=1,index=False)
df_mail = df_mail.fillna(0)
df_mail.dropna(how="all", inplace=True)
for i, row in df_mail.iterrows():
sql = 'SELECT * FROM `tablename WHERE condition for duplicate'
extist=con.execute(sql)
duplicate=extist.fetchall()
if len(duplicate) == 0:
df_mail.iloc[i:i+1].to_sql('table', con = engine, if_exists = 'append', chunksize = 1000,index=False)
else:
print("duplicate data")
Can you please share your code and a list of things you have tried? It is very hard to help without those.
How one should do image file uploads with Pyramid, SQLAlchemy and deform? Preferably so that one can easily get image thumbnail tags in the templates. What configuration is needed (store images on the file system backend, so on).
This question is by no means specifically asking one thing. Here however is a view which defines a form upload with deform, tests the input for a valid image file, saves a record to a database, and then even uploads it to amazon S3. This example is shown under the links to the various documentation I have referenced.
To upload a file with deform see the documentation.
If you want to learn how to save image files to disk, see this article see the official documentation
Then if you want to learn how to save new items with SQLAlchemy see the SQLAlchemy tutorial.
If you want to ask a better question where a more precise detailed answer can be given for each section, then please do so.
#view_config(route_name='add_portfolio_item',
renderer='templates/user_settings/deform.jinja2',
permission='view')
def add_portfolio_item(request):
user = request.user
# define store for uploaded files
class Store(dict):
def preview_url(self, name):
return ""
store = Store()
# create a form schema
class PortfolioSchema(colander.MappingSchema):
description = colander.SchemaNode(colander.String(),
validator = Length(max=300),
widget = text_area,
title = "Description, tell us a few short words desribing your picture")
upload = colander.SchemaNode(
deform.FileData(),
widget=widget.FileUploadWidget(store))
schema = PortfolioSchema()
myform = Form(schema, buttons=('submit',), action=request.url)
# if form has been submitted
if 'submit' in request.POST:
controls = request.POST.items()
try:
appstruct = myform.validate(controls)
except ValidationFailure, e:
return {'form':e.render(), 'values': False}
# Data is valid as far as colander goes
f = appstruct['upload']
upload_filename = f['filename']
extension = os.path.splitext(upload_filename)[1]
image_file = f['fp']
# Now we test for a valid image upload
image_test = imghdr.what(image_file)
if image_test == None:
error_message = "I'm sorry, the image file seems to be invalid is invalid"
return {'form':myform.render(), 'values': False, 'error_message':error_message,
'user':user}
# generate date and random timestamp
pub_date = datetime.datetime.now()
random_n = str(time.time())
filename = random_n + '-' + user.user_name + extension
upload_dir = tmp_dir
output_file = open(os.path.join(upload_dir, filename), 'wb')
image_file.seek(0)
while 1:
data = image_file.read(2<<16)
if not data:
break
output_file.write(data)
output_file.close()
# we need to create a thumbnail for the users profile pic
basewidth = 530
max_height = 200
# open the image we just saved
root_location = open(os.path.join(upload_dir, filename), 'r')
image = pilImage.open(root_location)
if image.size[0] > basewidth:
# if image width greater than 670px
# we need to recduce its size
wpercent = (basewidth/float(image.size[0]))
hsize = int((float(image.size[1])*float(wpercent)))
portfolio_pic = image.resize((basewidth,hsize), pilImage.ANTIALIAS)
else:
# else the image can stay the same size as it is
# assign portfolio_pic var to the image
portfolio_pic = image
portfolio_pics_dir = os.path.join(upload_dir, 'work')
quality_val = 90
output_file = open(os.path.join(portfolio_pics_dir, filename), 'wb')
portfolio_pic.save(output_file, quality=quality_val)
profile_full_loc = portfolio_pics_dir + '/' + filename
# S3 stuff
new_key = user.user_name + '/portfolio_pics/' + filename
key = bucket.new_key(new_key)
key.set_contents_from_filename(profile_full_loc)
key.set_acl('public-read')
public_url = key.generate_url(0, query_auth=False, force_http=True)
output_dir = os.path.join(upload_dir)
output_file = output_dir + '/' + filename
os.remove(output_file)
os.remove(profile_full_loc)
new_image = Image(s3_key=new_key, public_url=public_url,
pub_date=pub_date, bucket=bucket_name, uid=user.id,
description=appstruct['description'])
DBSession.add(new_image)
# add the new entry to the association table.
user.portfolio.append(new_image)
return HTTPFound(location = route_url('list_portfolio', request))
return dict(user=user, form=myform.render())
I am new to programming and F# is my first language.
Here are the relevant parts of my code:
let internal saveJsonToFile<'t> (someObject:'t) (filePath: string) =
use fileStream = new FileStream(filePath, FileMode.OpenOrCreate)
(new DataContractJsonSerializer(typeof<'t>)).WriteObject(fileStream, someObject)
let dummyFighter1 = { id = 1; name = "Dummy1"; location = "Here"; nationality = "Somalia"; heightInMeters = 2.0; weightInKgs = 220.0; weightClass = "Too fat"}
let dummyFighter2 = { id = 2; name = "Dummy2"; location = "There"; nationality = "Afghanistan"; heightInMeters = 1.8; weightInKgs = 80.0; weightClass = "Just Nice"}
let filePath = #"G:\User\Fighters.json"
saveJsonToFile dummyFighter1 filePath
saveJsonToFile dummyFighter2 filePath
When I run "saveJsonToFile dummyFighter1 filePath", the information is successfully saved. My problem is this: Once I run "saveJsonToFile dummyFighter2 filePath", it immediately replaces all the contents that are already in the file, i.e., all the information about dummyFighter1.
What changes should I make so that information about dummyFighter2 is appended to the file, instead of replacing information about dummyFighter1?
Change the way you open a file setting FileMode.OpenOrCreate to FileMode.Append. Append means "create or append" :
use fileStream = new FileStream(filePath, FileMode.Append)
From MSDN (https://msdn.microsoft.com/fr-fr/library/system.io.filemode%28v=vs.110%29.aspx) :
FileMode.Append opens the file if it exists and seeks to the end of the file, or
creates a new file. This requires FileIOPermissionAccess.Append
permission. FileMode.Append can be used only in conjunction with
FileAccess.Write. Trying to seek to a position before the end of the
file throws an IOException exception, and any attempt to read fails
and throws a NotSupportedException exception.