In the script below, I try to call via FLASK several functions in my homepage in HTML format. Each function calculates a number of pending that I want to display in a table on my homepage. To do this, I have:
computed with Panda a number of events to calculate from a csv file. I tested these functions separately and they work
created a table on my HTML homepage with the different categories for which I would like to display the number of events from my csv file
used the html_page.replace function to replace the number $$numberxx$$ in the html table by the result of my functions in Python
for each function, I created a route indicating the HTML link and the concerned function so that Flask can understand that for this function, after having calculated the result, it is necessary to go on the homepage page and replace in the table the object $$number1$$ by the result1 or $$number2$$ by the result2
Unfortunately, when I run Flask, nothing is happening. I still have $$number1$$ or $$number2$$ on my webpage.
Could you please help me to correct my script below:
Python code:
import flask
import csv
import pandas as pd
import numpy as np
app = flask.Flask("app_monitoringissues")
def get_html(page_name):
html_file = open(page_name + ".html")
content = html_file.read()
html_file.close()
return content
#app.route("/homepage/<count_pending1>")
def count_pending1():
html_page = get_html ("homepage")
df = pd.read_csv("sortdata.csv")
count1 = len(df[df["Status_Issue"].astype(str).str.contains("Pending-to be checked")])
count2 = df["Status_Issue"].isna().sum()
result1 = count1 + count2
return html_page.replace("$$NUMBER1$$", str(result1))
#app.route("/homepage/<count_pending2>")
def count_pending2():
html_page = get_html ("homepage")
df = pd.read_csv("sortdata.csv")
count1 = len(df[df["Status_Issue"].astype(str).str.contains("Pending CP")])
result2 = count1
return html_page.replace("$$NUMBER2$$", result2)
HTML code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Homepage</title>
</head>
<body>
<div class = "head">
<h1 id = "title homepage"> monitoring screen</h1>
<div class = "tablesupervisionarea">
<table class="tablesupervision">
<thead class ="theadsupervision">
<tr>
<th>Pending Supervision</th>
<th> </th>
</tr>
</thead>
<tbody>
<tr>
<th>Need to be checked by Tax Regulatory</th>
<td><ol>$$NUMBER1$$</ol></td>
</tr>
<tr>
<th>Pending with CP</th>
<td><ol>$$NUMBER2$$</ol></td>
</tr>
<tr>
</tbody>
</table>
</div>
This is not how you use Flask. Use render_template instead. Have a look at the doc first to understand the concept.
As an example, this function:
#app.route("/homepage/<count_pending2>")
def count_pending2():
html_page = get_html ("homepage")
df = pd.read_csv("sortdata.csv")
count1 = len(df[df["Status_Issue"].astype(str).str.contains("Pending CP")])
result2 = count1
return html_page.replace("$$NUMBER2$$", result2)
should look like:
#app.route("/homepage/<count_pending2>")
def count_pending2():
df = pd.read_csv("sortdata.csv")
result = len(df[df["Status_Issue"].astype(str).str.contains("Pending CP")])
return render_template("homepage.html", result=result)
And in the relevant HTML template, add a tag like this that will be replaced with appropriate values:
{{ result }}
Have a look at the doc, it's easy you'll see.
If I may suggest, try to improve the naming of variables and function names: count_pending1/2 etc are all very similar and do not give any clue about the purpose of the function. The code should be more explicit - before you even look at the function it should be obvious what it is supposed to do.
When you need to review your code, it would help a lot to have meaningful names to immediately spot the relevant section you want to edit. You already have two functions that are named almost the same and are quite similar. Ask yourself if you really need two functions or even more. Perhaps a simple function with a conditional block would make more sense than repeating code and making the whole program longer than it could be.
Related
I want to scrape two pieces of data from a website:
https://www.moneymetals.com/precious-metals-charts/gold-price
Specifically I want the "Gold Price per Ounce" and the "Spot Change" percent two columns to the right of it.
Using only Python standard libraries, is this possible? A lot of tutorials use the HTML element id to scrape effectively but inspecting the source for this page, it's just a table. Specifically I want the second and fourth <td> which appear on the page.
It's possible to do it with standard python libraries; ugly, but possible:
import urllib
from html.parser import HTMLParser
URL = 'https://www.moneymetals.com/precious-metals-charts/gold-price'
page = urllib.request.Request(URL)
result = urllib.request.urlopen(page)
resulttext = result.read()
class MyHTMLParser(HTMLParser):
gold = []
def handle_data(self, data):
self.gold.append(data)
parser = MyHTMLParser()
parser.feed(str(resulttext))
for i in parser.gold:
if 'Gold Price per Ounce' in i:
target= parser.gold.index(i) #get the index location of the heading
print(parser.gold[target+2]) #your target items are 2, 5 and 9 positions down in the list
print(parser.gold[target+5].replace('\\n',''))
print(parser.gold[target+9].replace('\\n',''))
Output (as of the time the url was loaded):
$1,566.70
8.65
0.55%
I am working on my second Python scraper and keep running into the same problem. I would like to scrape the website shown in the code below. I would like to be ability to input parcel numbers and see if their Property Use Code matches. However, I am not sure if my scraper if finding the correct row in the table. Also, not sure how to use the if statement if the use code is not the 3730.
Any help would be appreciated.
from bs4 import BeautifulSoup
import requests
parcel = input("Parcel Number: ")
web = "https://mcassessor.maricopa.gov/mcs.php?q="
web_page = web+parcel
web_header={'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_13_2)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36'}
response=requests.get(web_page,headers=web_header,timeout=100)
soup=BeautifulSoup(response.content,'html.parser')
table=soup.find("td", class_="Property Use Code" )
first_row=table.find_all("td")[1]
if first_row is '3730':
print (parcel)
else:
print ('N/A')
There's no td with class "Property Use Code" in the html you're looking at - that is the text of a td. If you want to find that row, you can use
td = soup.find('td', text="Property Use Code")
and then, to get the next td in that row, you can use:
otherTd = td.find_next_sibling()
or, of you want them all:
otherTds = td.find_next_siblings()
It's not clear to me what you want to do with the values of these tds, but you'll want to use the text attribute to access them: your first_row is '3730' will always be False, because first_row is a bs4.element.Tag object here and '3730' is a str. You can, however, get useful information from otherTd.text == '3730'.
Background: I need to write an html table parser in python for HTML tables with varying colspans and rowspans. Upon some research I stumbled about this gem. It works well for simple cases without wacky colspans and rowspans, however I've run into a bug. The code assumes that if an element has a colspan of 3, it belongs to three different table headers, while it really only belongs to the table header the colspan falls in the center of. An example of this can be seen at http://en.wiktionary.org/wiki/han#Swedish (open up the declension table under the Swedish section). The code incorrectly returns that "hans" (possessive-neuter-3rd person masculine) belongs to possessive-common-3rd person masculine and possessive-plural-3rd person masculine because it has a colspan of 3. I've tried adding a check to table_to_2d_dict which would create a counter if a colspan > 1, and only count the element as a part of a header if the counter was equal to the the colspan // 2 + 1 (this returns the median of the range(1,colspan+1) which is the value of the table header which the element should be counted as). However, when I implement this check in the location specified in the code below, it doesn't work. To be honest this probably stems from my lack of understanding how this code works, so...
Question: Can someone explain what this code does and why it malfunctions as described above? If someone can implement a fix that'd be great but right now I'm primarily concerned with understanding the code. Thanks
Below is the code with comments that I've added to highlight parts of the code I understand and parts I don't.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from collections import defaultdict
def table_to_list(table):
dct = table_to_2d_dict(table)
return list(iter_2d_dict(dct))
def table_to_2d_dict(table):
result = defaultdict(lambda : defaultdict(str))
for row_i, row in enumerate(table.xpath('./tr')): #these double for loops iterate over each element in the table
for col_i, col in enumerate(row.xpath('./td|./th')):
colspan = int(col.get('colspan', 1)) #gets colspan attr of the element, if none assumes it's 1
rowspan = int(col.get('rowspan', 1)) #gets rowspan attr of the element, if none assumes it's 1
col_data = col.text_content() #gets raw text inside element
#WHAT DOES THIS DO? :(
while row_i in result and col_i in result[row_i]:
col_i += 1
for i in range(row_i, row_i + rowspan):
for j in range(col_i, col_i + colspan):
result[i][j] = col_data
return result
#what does this do? :(
def iter_2d_dict(dct):
for i, row in sorted(dct.items()):
cols = []
for j, col in sorted(row.items()):
cols.append(col)
yield cols
if __name__ == '__main__':
import lxml.html
from pprint import pprint
doc = lxml.html.parse('tables.html')
for table_el in doc.xpath('//table'):
table = table_to_list(table_el)
pprint(table)
i want to do paging. but i only want to know the current page number, so i will call the webservice function and send this parameter and recieve the curresponding data. so i only want to know how can i be aware of current page number? i'm writing my project in django and i create the page with xsl. if o know the page number i think i can write this in urls.py:
url(r'^ask/(\d+)/$',
'ask',
name='ask'),
and call the function in views.py like:
ask(request, pageNo)
but i don't know where to put pageNo var in html page. (so fore example with pageN0=2, i can do pageNo+1 or pageNo-1 to make the url like 127.0.0.01/ask/3/ or 127.0.0.01/ask/2/). to make my question more cleare i want to know how can i do this while we don't have any variables in html?
sorry for my crazy question, i'm new in creating website and also in django. :">
i'm creating my html page with xslt. so i send the total html page. (to show.html which contains only {{str}} )
def ask(request:
service = GetConfigLocator().getGetConfigHttpSoap11Endpoint()
myRequest = GetConfigMethodRequest()
myXml = service.GetConfigMethod(myRequest)
myXmlstr = myXml._return
styledoc = libxml2.parseFile("ask.xsl")
style = libxslt.parseStylesheetDoc(styledoc)
doc = libxml2.parseDoc(myXmlstr)
result = style.applyStylesheet(doc, None)
out = style.saveResultToString( result )
ok = mark_safe(out)
style.freeStylesheet()
doc.freeDoc()
result.freeDoc()
return render_to_response("show.html", {
'str': ok,
}, context_instance=RequestContext(request))
i'm not working with db and i just receive xml file to parse it. so i don't have contact_list = Contacts.objects.all(). can i still use this way? should i put the first parameter inpaginator = Paginator(contact_list, 25) blank?
if you user standart django paginator, thay send you to url http://example.com/?page=N, where N - number you page
So,
# urls.py
url('^ask/$', 'ask', name='viewName'),
You can get page number in views:
# views.py
def ask(request):
page = request.GET.get('page', 1)
I've got an XML file.
<key>457</key>
<dict>
<key>Track ID</key><integer>457</integer>
<key>Name</key><string>Love me do</string>
<key>Artist</key><string>The Beatles</string>
<key>Album Artist</key><string>The Beatles</string>
<key>Composer</key><string>John Lennon/Paul McCartney</string>
<key>Album</key><string>The Beatles No.1</string>
<key>Genre</key><string>Varies</string>
<key>Kind</key><string>AAC audio file</string>
</dict>
I've removed for these purposes a lot of the file (this is one song, and there are about 20-30 more lines of XML per song). What I'd like to do is extract the 'Artist' string from each song, and then remove all of the repeated strings, and then take that and output it into an HTML file; preferably in a way that autorefreshes when a new version of the .xml is found, thus keeping an updated file, but if that overcomplicates it, that's fine.
I've looked into ways with doing it with jQuery, and I've had PHP suggested, but I'm unsure of which is the better/cleaner; and I'm unsure how I would go about doing it in either.
Many thanks,
Henry.
I would do this in PHP: put your XML into a string, then (because only you are going to use this), encode it to JSON, decode it into an assoc array, then run a foreach loop to extract the artists, and finally remove the duplications, and then save it as an HTML. Then, you can add a cron job to run this periodically, and generate the HTML. Run this code, then link to the results that it gives out.
$contents = '<key>Blah.... lots of XML';
$xml = simplexml_load_string($contents);
$json = json_encode($xml);
$array = json_decode($json, true);
print_r($array);
Once I know the structure of the array that is produced, I can complete the code. But it would look something like this:
foreach($array['dict']['artist'] as $artist) {
$artists[] = $artist;
}
// Now $artists holds an array of the artists
$arists = array_unique($artists);
// Now there are no duplicates
foreach($artists as $artist) {
$output .= '<p>',$artist,'</p>';
}
// Now each artist is put in it's own paragraph.
// Either output the output
echo $output;
// Or save it to a file (in this case, 'artists.html')
$fh = fopen('artists.html', 'w') or die("Can't open file");
fwrite($fh, $output);
fclose($fh);
This does not work completely get as the line in the first foreach loop needs a bit of tweaking, but this is a starting point.
What exactly are you trying to achieve? If you need HTML files that are periodically regenerated based on the XML files, then you probably want to write a program (for example, the BeautifulSoup Python library allows you to parse XML/HTML files quite easily) for it and run it every time you need to update the HTML files (you can also set up a cron job for it).
If you need to be able to fetch the data from XML on the fly, you can use some JavaScript library and load the XML from an xml file, then add it to the page dynamically.
For example, this Python program will parse an XML file (file.xml) and create an HTML file (song_information.html) that contains data from the XML file.
from BeautifulSoup import BeautifulStoneSoup
f = open("file.xml")
soup = BeautifulStoneSoup(f.read())
f.close()
html = """<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
"""
for key in soup.dict.findAll('key'):
html += "<h1>%s</h1>\n" % key.contents[0]
html += "<p>%s</p>\n" % key.nextSibling.contents[0]
html += """</body>
</html>
"""
f = open("song_information.html", "w")
f.write(html)
f.close()
It will write the following HTML to the song_information.html file:
<!DOCTYPE html>
<html>
<head>
<title>Song information</title>
</head>
<body>
<h1>Track ID</h1>
<p>457</p>
<h1>Name</h1>
<p>Love me do</p>
<h1>Artist</h1>
<p>The Beatles</p>
<h1>Album Artist</h1>
<p>The Beatles</p>
<h1>Composer</h1>
<p>John Lennon/Paul McCartney</p>
<h1>Album</h1>
<p>The Beatles No.1</p>
<h1>Genre</h1>
<p>Varies</p>
<h1>Kind</h1>
<p>AAC audio file</p>
</body>
</html>
Of course, this is simplified. If you need to implement unicode support, you will want to edit it like this:
from BeautifulSoup import BeautifulStoneSoup
import codecs
f = codecs.open("file.xml", "r", "utf-8")
soup = BeautifulStoneSoup(f.read())
f.close()
html = """<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Song information</title>
</head>
<body>
"""
for key in soup.dict.findAll('key'):
html += "<h1>%s</h1>\n" % key.contents[0]
html += "<p>%s</p>\n" % key.nextSibling.contents[0]
html += """</body>
</html>
"""
f = codecs.open("song_information.html", "w", "utf-8")
f.write(html)
f.close()
Also, you will probably need to generate more complex HTML, so you will likely want to try some template systems like Jinja2.