python color entire pandas dataframe rows based on column values - html

I have a script that downloads a .csv and does some manipulation and then emails panda dataframes in a nice html format by using df.to_html.
I would like to enhance these tables by highlighting, or coloring, different rows based on their text value in a specific column.
I tried using pandas styler which appears to work however I can not convert that to html using to_html. I get a "AttributeError: 'str' object has no attribute 'to_html"
Is there a another way to do this?
As an example lets say my DF looks like the following and I want to highlight all rows for each manufacturer. i.e Use three different colors for Ford, Chevy, and Dodge:
Year Color Manufacturer
2011 Red Ford
2010 Yellow Ford
2000 Blue Chevy
1983 Orange Dodge
I noticed I can pass formatters into to_html but it appears that it cannot do what I am trying to accomplish by coloring? I would like to be able to do something like:
def colorred():
return ['background-color: red']
def color_row(value):
if value is "Ford":
result = colorred()
return result
df1.to_html("test.html", escape=False, formatters={"Manufacturer": color_row})

Surprised this has never been answered as looking back at it I do not believe this is even possible with to_html formatters. After revisiting this several times I have found a very nice solution I am happy with. I have not seen anything close to this online so I hope this helps someone else.
d = {'Year' : [2011, 2010, 2000, 1983],
'Color' : ['Red', 'Yellow', 'Blue', 'Orange'],
'Manufacturer' : ['Ford', 'Ford', 'Chevy', 'Dodge']}
df =pd.DataFrame(d)
print (df)
def color_rows(s):
df = s.copy()
#Key:Value dictionary of Column Name:Color
color_map = {}
#Unqiue Column values
manufacturers = df['Manufacturer'].unique()
colors_to_use = ['background-color: #ABB2B9', 'background-color: #EDBB99', 'background-color: #ABEBC6',
'background-color: #AED6F1']
#Loop over our column values and associate one color to each
for manufacturer in manufacturers:
color_map[manufacturer] = colors_to_use[0]
colors_to_use.pop(0)
for index, row in df.iterrows():
if row['Manufacturer'] in manufacturers:
manufacturer = row['Manufacturer']
#Get the color to use based on this rows Manufacturers value
my_color = color_map[manufacturer]
#Update the row using loc
df.loc[index,:] = my_color
else:
df.loc[index,:] = 'background-color: '
return df
df.style.apply(color_rows, axis=None)
Output:
Pandas row coloring
Since I do not have the cred to embed images here is how I email it. I convert it to html with the following.
styled = df.style.apply(color_rows, axis=None).set_table_styles(
[{'selector': '.row_heading',
'props': [('display', 'none')]},
{'selector': '.blank.level0',
'props': [('display', 'none')]}])
html = (styled.render())

Related

Callback for multivalue dropdown

I am pretty new to Plotly Dash and have been struggling especially with multivalue dropdown callback and would really appreciate any help. Basically I've followed a tutorial and created a pie-chart if a single pillar(from my data) value is selected. I would like to achieve two things:
The default or initial chart should show all pillar and the number of projects
Multi selection of pillar values
My main issue is actually the creating the callback for these. Thank you in advance for any help!!
Here is my code
app = dash.Dash(__name__)
all = df.Pillar.unique()
app.layout=html.Div([
html.H1("PM dashboard"),
dcc.Dropdown(id='pillar-choice',
options=[{'label':x, 'value':x}
for x in all],
value='Service Provider',
multi=False),
dcc.Graph(id='my-graph',
figure={}),
])
#app.callback(
Output(component_id='my-graph', component_property='figure'),
Input(component_id='pillar-choice', component_property='value')
)
def interactive_graphs(value_pillar):
print(value_pillar)
dff = df[df.Pillar==value_pillar]
fig = px.pie(data_frame=dff, names='Pillar', values='Project No')
return fig
if __name__=='__main__':
app.run_server()
I think the problem here is that value_pillar will be a list, so you need to do something like:
dff = df[df.Pillar.isin(value_pillar)]
And if you want to show everything by default, you'll need to check the value of that argument for your default value and, if it matches the default, avoid filtering.

Scrape table with no ids or classes using only standard libraries?

I want to scrape two pieces of data from a website:
https://www.moneymetals.com/precious-metals-charts/gold-price
Specifically I want the "Gold Price per Ounce" and the "Spot Change" percent two columns to the right of it.
Using only Python standard libraries, is this possible? A lot of tutorials use the HTML element id to scrape effectively but inspecting the source for this page, it's just a table. Specifically I want the second and fourth <td> which appear on the page.
It's possible to do it with standard python libraries; ugly, but possible:
import urllib
from html.parser import HTMLParser
URL = 'https://www.moneymetals.com/precious-metals-charts/gold-price'
page = urllib.request.Request(URL)
result = urllib.request.urlopen(page)
resulttext = result.read()
class MyHTMLParser(HTMLParser):
gold = []
def handle_data(self, data):
self.gold.append(data)
parser = MyHTMLParser()
parser.feed(str(resulttext))
for i in parser.gold:
if 'Gold Price per Ounce' in i:
target= parser.gold.index(i) #get the index location of the heading
print(parser.gold[target+2]) #your target items are 2, 5 and 9 positions down in the list
print(parser.gold[target+5].replace('\\n',''))
print(parser.gold[target+9].replace('\\n',''))
Output (as of the time the url was loaded):
$1,566.70
8.65
0.55%

Simple Grouped Barplot (ggplot)

I want to do a barplot like this one (with different values):
Here's my dataframe:
Here's the top part of my code:
library("openxlsx")
library(ggplot2)
library("reshape")
df1<- read.xlsx("graficosparaR.xlsx", sheet = 'Hoja2', colNames = TRUE)
I tried using ggplotfunction, but I can't get it (I'm a beginner R user).
Could you help me writing the code for the barplot?

Can anyone help me understand this code (HTML table parsing in lxml, python)?

Background: I need to write an html table parser in python for HTML tables with varying colspans and rowspans. Upon some research I stumbled about this gem. It works well for simple cases without wacky colspans and rowspans, however I've run into a bug. The code assumes that if an element has a colspan of 3, it belongs to three different table headers, while it really only belongs to the table header the colspan falls in the center of. An example of this can be seen at http://en.wiktionary.org/wiki/han#Swedish (open up the declension table under the Swedish section). The code incorrectly returns that "hans" (possessive-neuter-3rd person masculine) belongs to possessive-common-3rd person masculine and possessive-plural-3rd person masculine because it has a colspan of 3. I've tried adding a check to table_to_2d_dict which would create a counter if a colspan > 1, and only count the element as a part of a header if the counter was equal to the the colspan // 2 + 1 (this returns the median of the range(1,colspan+1) which is the value of the table header which the element should be counted as). However, when I implement this check in the location specified in the code below, it doesn't work. To be honest this probably stems from my lack of understanding how this code works, so...
Question: Can someone explain what this code does and why it malfunctions as described above? If someone can implement a fix that'd be great but right now I'm primarily concerned with understanding the code. Thanks
Below is the code with comments that I've added to highlight parts of the code I understand and parts I don't.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from collections import defaultdict
def table_to_list(table):
dct = table_to_2d_dict(table)
return list(iter_2d_dict(dct))
def table_to_2d_dict(table):
result = defaultdict(lambda : defaultdict(str))
for row_i, row in enumerate(table.xpath('./tr')): #these double for loops iterate over each element in the table
for col_i, col in enumerate(row.xpath('./td|./th')):
colspan = int(col.get('colspan', 1)) #gets colspan attr of the element, if none assumes it's 1
rowspan = int(col.get('rowspan', 1)) #gets rowspan attr of the element, if none assumes it's 1
col_data = col.text_content() #gets raw text inside element
#WHAT DOES THIS DO? :(
while row_i in result and col_i in result[row_i]:
col_i += 1
for i in range(row_i, row_i + rowspan):
for j in range(col_i, col_i + colspan):
result[i][j] = col_data
return result
#what does this do? :(
def iter_2d_dict(dct):
for i, row in sorted(dct.items()):
cols = []
for j, col in sorted(row.items()):
cols.append(col)
yield cols
if __name__ == '__main__':
import lxml.html
from pprint import pprint
doc = lxml.html.parse('tables.html')
for table_el in doc.xpath('//table'):
table = table_to_list(table_el)
pprint(table)

Issue in outputting data scraped using beautiful soup in two columns of csv using spamwriter.writerow

I am scraping 2 sets of data from a website using beautiful soup and I want them to output in a csv file in 2 columns side by side. I am using spamwriter.writerow([x,y]) argument for this but I think because of some error in my recursion structure, I am getting the wrong output in my csv file. Below is the referred code:
import csv
import urllib2
import sys
from bs4 import BeautifulSoup
page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
with open('Smartphones_20decv2.0.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"},text=True):
if anchor.string:
print unicode(anchor.string).encode('utf8').strip()
for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
textcontent = u' '.join(anchor1.stripped_strings)
if textcontent:
print textcontent
spamwriter.writerow([unicode(anchor.string).encode('utf8').strip(),textcontent])
Output which I am getting in csv is:
Samsung Focus® 2 (Refurbished) $99.99
Samsung Focus® 2 (Refurbished) $99.99 to $199.99 8 to 16 GB
Samsung Focus® 2 (Refurbished) $0.99
Samsung Focus® 2 (Refurbished) $0.99
Samsung Focus® 2 (Refurbished) $149.99 to $349.99 16 to 64 GB
Problem is I am getting only 1 device name in column 1 instead of all while price is coming for all devices.
Please pardon my ignorance as I am new to programming.
You are using anchor.string, instead of archor1. anchor is the last item from the previous loop, instead of the item in the current loop.
Perhaps using clearer variable names would help avoid confusion here; use singleitem and gridprice perhaps?
It could be I misunderstood though and you want to combine each anchor1 with a corresponding anchor. You'll have to loop over them together, perhaps using zip():
items = soup.findAll('a', {"class": "clickStreamSingleItem"},text=True)
prices = soup.findAll('div', {"class": "listGrid-price"})
for item, price in zip(items, prices):
textcontent = u' '.join(price.stripped_strings)
if textcontent:
print textcontent
spamwriter.writerow([unicode(item.string).encode('utf8').strip(),textcontent])
Normally it should be easier to loop over the parent table row instead, then find the cells within that row within a loop. But the zip() should work too, provided the clickStreamSingleItem cells line up with the listGrid-price matches.