I am using django to display a webpage. I want to display details of files from database. These files can be of any number. I am using a form to input a date from HTML file which is accepted by a django view function. This function checks if the request is of type POST and returns the data between the mentioned date. Now when I use pagination to display these pages, pressing the next doesn't show anything as this time the request gets changed to GET. In my django view function, data is fetched in a list. Every file's data is in a list. So, all the data consists lists of lists. How will I be able to display these lists using pagination without sending the requests again. Is it possible to do so?
My data is stored in a database according to dates. I am fetching the data in a given range. This range comes from html page. Here is how I am trying to fetch the details:
<div class="container">
<form method="POST">
{% csrf_token %}
<input type="date" name="start_date">
<input type="date" name="end_date">
<button type="Submit">Get Details</button>
</form>
</div>
This is the code that I use at the backend to serve the webpage.
def data_range(request):
if request.method == 'POST':
form = Form(request.POST)
try:
if form.is_valid():
start_date = form.data.get('start_date')
end_date = form.data.get('end_date')
page_data = get_page_details(start_date, end_date)
else:
current_date = datetime.datetime.now()
start_date = end_date = current_date.strftime("%Y-%m-%d")
page_data = get_page_details(start_date, end_date)
except Exception:
return render(request, 'UserInterface/no_data_fort.html', {'list_item_count': 0})
else:
return render(request, 'UserInterface/no_data_fort.html', {'list_item_count': 0})
if page_data:
# Calculating total files in fortnightly data
list_item_count = len(page_data)
# Adding Paginator or pagination
paginator = Paginator(page_data, 1)
page = request.GET.get('page')
data = paginator.get_page(page)
return render(request, 'UserInterface/fortnightly_range.html', {'page_data_list': data, 'list_item_count': list_item_count})
else:
return render(request, 'UserInterface/no_data_fort.html', {'list_item_count': 0})
page_data that I fetch is a list of lists.
i want to display cryptocurrency prices on my site. Therefor i parse the latest BTC/USD price from coinmarketcap.com
now i want to display them in a list but i first dont know who to save the symbol from the json to my database and second how can i display my view propperly. Currently i only save key:value of price_usd where key is the name of the currency.
views.py
def crypto_ticker(request):
list_prices = CryptoPrices.objects.get_queryset().order_by('-pk')
paginator = Paginator(list_prices, 100) # Show 100 prices per page
page = request.GET.get('page')
price = paginator.get_page(page)
return render(request, 'MyProject/crypto_ticker.html', {'price': price})
urls.py
url(r'^crypto_ticker/$', MyProject_views.crypto_ticker, name='crypto_ticker'),
models.py
class CryptoPrices(models.Model):
symbol = models.CharField(max_length=10)
key = models.CharField(max_length=30)
value = models.CharField(max_length=200)
celery update task:
#periodic_task(run_every=(crontab(minute='*/1')), name="Update Crypto rate(s)", ignore_result=True)
def get_exchange_rate():
api_url = "https://api.coinmarketcap.com/v1/ticker/?limit=100"
try:
exchange_rates = requests.get(api_url).json()
for exchange_rate in exchange_rates:
CryptoPrices.objects.update_or_create(key=exchange_rate['id'],
defaults={'value': round(float(exchange_rate['price_usd']), 3)}
)
logger.info("Exchange rate(s) updated successfully.")
except Exception as e:
print(e)
Surely just adding
symbol= exchange_rate['symbol']
to your update_or_create will work?
The JSON from coinmarketcap sets that as a key in the dictionary, unless you want an image that they use?
In that case you would have to save copies of that image yourself, create a mapping from the text of the symbol to the image itself, and format that on your html output.
So I'm aiming to scrape 2 tables (in different formats) from a website - https://info.fsc.org/details.php?id=a0240000005sQjGAAU&type=certificate after using the search bar to iterate this over a list of license codes. I haven't included the loop fully yet but I added it at the top for completeness.
My issue is that because the two tables I want, Product Data and Certificate Data are in 2 different formats, so I have to scrape them separately. As the Product data is in the normal "tr" format on the webpage, this bit is easy and I've managed to extract a CSV file of this. The harder bit is extracting Certificate Data, as it is in "div" form.
I've managed to print the Certificate Data as a list of text, using the class function, however I need to have it in a tabular form saved in a CSV file. As you can see, I've tried several unsuccessful ways of converting it to a CSV but If you have any suggestions, it would be much appreciated, thank you!! Also any other general tips to improve my code would be great too, as I am new to web-scraping.
#namelist = open('example.csv', newline='', delimiter = 'example')
#for name in namelist:
#include all of the below
driver = webdriver.Chrome(executable_path="/Users/jamesozden/Downloads/chromedriver")
url = "https://info.fsc.org/certificate.php"
driver.get(url)
search_bar = driver.find_element_by_xpath('//*[#id="code"]')
search_bar.send_keys("FSC-C001777")
search_bar.send_keys(Keys.RETURN)
new_url = driver.current_url
r = requests.get(new_url)
soup = BeautifulSoup(r.content,'lxml')
table = soup.find_all('table')[0]
df, = pd.read_html(str(table))
certificate = soup.find(class_= 'certificatecl').text
##certificate1 = pd.read_html(str(certificate))
driver.quit()
df.to_csv("Product_Data.csv", index=False)
##certificate1.to_csv("Certificate_Data.csv", index=False)
#print(df[0].to_json(orient='records'))
print certificate
Output:
Status
Valid
First Issue Date
2009-04-01
Last Issue Date
2018-02-16
Expiry Date
2019-04-01
Standard
FSC-STD-40-004 V3-0
What I want but over hundreds/thousands of license codes (I just manually created this one sample in Excel):
Desired output
EDIT
So whilst this is now working for Certificate Data, I also want to scrape the Product Data and output that into another .csv file. However currently it is only printing 5 copies of the product data for the final license code which is not what I want.
New Code:
df = pd.read_csv("MS_License_Codes.csv")
codes = df["License Code"]
def get_data_by_code(code):
data = [
('code', code),
('submit', 'Search'),
]
response = requests.post('https://info.fsc.org/certificate.php', data=data)
soup = BeautifulSoup(response.content, 'lxml')
status = soup.find_all("label", string="Status")[0].find_next_sibling('div').text
first_issue_date = soup.find_all("label", string="First Issue Date")[0].find_next_sibling('div').text
last_issue_date = soup.find_all("label", string="Last Issue Date")[0].find_next_sibling('div').text
expiry_date = soup.find_all("label", string="Expiry Date")[0].find_next_sibling('div').text
standard = soup.find_all("label", string="Standard")[0].find_next_sibling('div').text
return [code, status, first_issue_date, last_issue_date, expiry_date, standard]
# Just insert here output filename and codes to parse...
OUTPUT_FILE_NAME = 'Certificate_Data.csv'
#codes = ['C001777', 'C001777', 'C001777', 'C001777']
df3=pd.DataFrame()
with open(OUTPUT_FILE_NAME, 'w') as f:
writer = csv.writer(f)
for code in codes:
print('Getting code# {}'.format(code))
writer.writerow((get_data_by_code(code)))
table = soup.find_all('table')[0]
df1, = pd.read_html(str(table))
df3 = df3.append(df1)
df3.to_csv('Product_Data.csv', index = False, encoding='utf-8')
Here's all you need.
No chromedriver. No pandas. Forget about it in context of scraping.
import requests
import csv
from bs4 import BeautifulSoup
# This is all what you need for your task. Really.
# No chromedriver. Don't use it for scraping. EVER.
# No pandas. Don't use it for writing csv. It's not what pandas was made for.
#Function to parse single data page based on single input code.
def get_data_by_code(code):
# Parameters to build POST-request.
# "type" and "submit" params are static. "code" is your desired code to scrape.
data = [
('type', 'certificate'),
('code', code),
('submit', 'Search'),
]
# POST-request to gain page data.
response = requests.post('https://info.fsc.org/certificate.php', data=data)
# "soup" object to parse html data.
soup = BeautifulSoup(response.content, 'lxml')
# "status" variable. Contains first's found [LABEL tag, with text="Status"] following sibling DIV text. Which is status.
status = soup.find_all("label", string="Status")[0].find_next_sibling('div').text
# Same for issue dates... etc.
first_issue_date = soup.find_all("label", string="First Issue Date")[0].find_next_sibling('div').text
last_issue_date = soup.find_all("label", string="Last Issue Date")[0].find_next_sibling('div').text
expiry_date = soup.find_all("label", string="Expiry Date")[0].find_next_sibling('div').text
standard = soup.find_all("label", string="Standard")[0].find_next_sibling('div').text
# Returning found data as list of values.
return [response.url, status, first_issue_date, last_issue_date, expiry_date, standard]
# Just insert here output filename and codes to parse...
OUTPUT_FILE_NAME = 'output.csv'
codes = ['C001777', 'C001777', 'C001777', 'C001777']
with open(OUTPUT_FILE_NAME, 'w') as f:
writer = csv.writer(f)
for code in codes:
print('Getting code# {}'.format(code))
#Writing list of values to file as single row.
writer.writerow((get_data_by_code(code)))
Everything is really straightforward here. I'd suggest you spend some time in Chrome dev tools "network" tab to have a better understanding of request forging, which is a must for scraping tasks.
In general, you don't need to run chrome to click the "search" button, you need to forge request generated by this click. Same for any form and ajax.
well... you should sharpen your skills (:
df3=pd.DataFrame()
with open(OUTPUT_FILE_NAME, 'w') as f:
writer = csv.writer(f)
for code in codes:
print('Getting code# {}'.format(code))
writer.writerow((get_data_by_code(code)))
### HERE'S THE PROBLEM:
# "soup" variable is declared inside of "get_data_by_code" function.
# So you can't use it in outer context.
table = soup.find_all('table')[0] # <--- you should move this line to
#definition of "get_data_by_code" function and return it's value somehow...
df1, = pd.read_html(str(table))
df3 = df3.append(df1)
df3.to_csv('Product_Data.csv', index = False, encoding='utf-8')
As per example you can return dictionary of values from "get_data_by_code" function:
def get_data_by_code(code):
...
table = soup.find_all('table')[0]
return dict(row=row, table=table)
I'm passing a list of HTML element to the views.py from html through post but Im just getting the last value.
here is the html code that i used, multiple lines of this one
<input name="idborrow[]" id="borrow" value='+element[i].id+'>
and here is my code in the views.py
if request.method == 'POST':
idborrow = request.POST.get('idborrow[]', '')
print (idborrow)
in the console, it just prints the last value, how to get the whole list of values
Try using getlist
Ex:
request.POST.getlist('idborrow[]')
I have managed to get the google charts plugin (http://www.web2pyslices.com/slice/show/1721/google-charts-plugin) to work with my web2py application. Using the JSON example data (data hard coded into default.py).
I am struggling with using my own data. The chart does not work with my JSON data which returns the information as:
{"data": [["2014-03-28", 1000], ["2014-03-25", 1100]]}
When I hardcode the data with the titles, the chart works:
data = [['Date','Sales'],["2014-03-28",1000],["2014-03-25",1100]]
This returns JSON as:
{"data": [["Date", "Sales"], ["2014-03-28", 1000], ["2014-03-25", 1100]]}
The code for this is:
def return_data():
data = [['Date','Sales'],["2014-03-28",1000],["2014-03-25",1100]]
return dict(data=data)
Below is the code I am using in default.py to return the information from the database, the query works, it's the chart that doesn't!:
def return_data():
sales = db().select(db.sales.quantity, db.sales.date)
data = [[row.date,row.quantity] for row in sales]
return dict(data=data)
Somehow, I think I need to add the 'date' and 'sales' labels to the start of the json data but I have not managed to do this - I think I need to do some sort of encode? - do I need to use the simplejson or can this be done without??
Many thanks