Stop jupyter notebook wrapping cell contents in pandas html table output - html

The pandas option max_colwidth controls how many characters will be included in the repr of a dataframe:
import string, random
import pandas as pd
df = pd.DataFrame([''.join(random.choice(string.ascii_lowercase + ' ') for j in range(1000)) for i in range(4)])
pd.options.display.max_colwidth = 10
print(df)
yields
0
0 lmftge...
1 pqttqb...
2 wi wgy...
3 ow dip...
and
pd.options.display.max_colwidth = 30
print(df)
yields
0
0 lmftgenioerszvgzfaxorzciow...
1 pqttqbqqe pykgguxnjsspbcti...
2 wi wgybtgcbxkobrwnaxpxwsjc...
3 ow dippaiamvvcofvousieckko...
And you can set pd.options.display.max_colwidth = 0 to remove the limit altogether. Fine so far!
But if the dataframe is rendered in HTML inside a notebook, the notebook will wrap the table of the column to the width of the display, regardless of this setting:
Is there any way to avoid this, i.e. to have the HTML table column rendered as wide as is necessary to fit the each row on a single line?
More generally, is it possible to control the width of HTML table columns in notebook output independent of the number of characters in the pandas output?

Building on Ben's answer, but without needing to go into the custom css files, which work differently for juptyter lab.
Just put this in a cell and run it:
%%html
<style>
.dataframe td {
white-space: nowrap;
}
</style>

If you make a file: ~/.jupyter/custom$ atom custom.css and then put this in it:
.dataframe td {
white-space: nowrap;
}
Then it will force the cell to show as one line, but then you get a scrolling table.
If you want it to not scroll, then set:
div.output_subarea {
overflow-x: inherit;
}
and then it'll be as wide as it needs to be:
It's not super pretty, but I'm sure that you can tidy it up if needs be.
I found this very helpful. You'll also need to restart the notebook after you first create the css file for it to register, but from then on you can just refresh the page to see the changes to the css take effect.
This was the notebook that I was testing on.

Another option if you prefer to do it with 1 line of code or if you need to have different formats in different parts of your notebook, is to use Pandas Styler:
dfs = df.style.set_table_styles([dict(selector="td", props=[('white-space', 'nowrap')])])
display(dfs)
It is based on CSS so if in the middle of the notebook you want to go back to the previous format, you can write:
dfs = df.style.set_table_styles([dict(selector="td", props=[('overflow', 'hidden'),
('text-overflow', 'ellipsis'), ('max-width', '120px')])])

Related

Picture can't be displayed when adding picture to copied table using python-docx

I'm using python-docx to edit an existing word file. I've created a simpler case where it also fails starting with a blank document. Adding a picture to a cell in a table works just as expected. But if I want to copy that table, there seems to be a problem when I add an image to the new table. I've looked elsewhere, but this problem seems to be unique to copied tables only.
from copy import deepcopy
from docx import Document
doc = Document()
table1 = doc.add_table(rows=2, cols=2)
table1.style = 'TableGrid'
table2 = deepcopy(table)
table2._tbl = deepcopy(table1._tbl)
p = doc.add_paragraph()
p._p.addnext(table2._tbl)
# Add image to table 1
cell = table1.rows[0].cells[0]
cell.paragraphs[0].add_run().add_picture("picture.png", width=Cm(10))
# Add image to table 2
cell = table2.rows[0].cells[0]
cell.paragraphs[0].add_run().add_picture("picture.png", width=Cm(10))
doc.save("t.docx")
Problem: If I add the same image to both, it will appear correct. If I add it to only table 1, it also works. But if I add it to the copy, then it fails and looks like this:
It seems to be something that is different in the second, copied table. I know it looks like a hack the way it was created. I did it this way because, in my original document, the table is quite complex and needs to be duplicated for each case.
(Edit: added missing deepcopyimport)

Make HTML table srollable

In a project report I want to enter some Dataframes.
Normally I generate the report with MarkDown but some tables are too wide to display nicely.
So I tried to export as in HTML with the code bellow:
# 1. Set up multiple variables to store the titles, text within the report
page_title_text='My report'
title_text = 'Scrollable table'
text = 'Hello, welcome to your Scrollable table test!'
prices_text = 'Twitter Data'
# 2. Combine them together using a long f-string
html = f'''
<html>
<head>
<title>{page_title_text}</title>
</head>
<body>
<h1>{title_text}</h1>
<p>{text}</p>
<h2>{prices_text}</h2>
{wrd_archive.head(5).to_html()}
</body>
</div>
</html>
'''
# 3. Write the html string as an HTML file
with open('html_report.html', 'w') as f:
f.write(html)
The output looks like the screenshot below and I am searching since hours for a solution how to make this table scrollable on the vertical axis and (that would be nice) scrollable vertical with a fixed height and fixed headers.
I do not know why I am lost today but I don't find the correct solution.
Can someone please help ;)
Screenshot

Selenium, using find_element but end up with half the website

I finished the linked tutorial and tried to modify it to get somethings else from a different website. I am trying to get the margin table of HHI but the website is coded in a strange way that I am quite confused.
I find the child element of the parent that have the text with xpath://a[#name="HHI"], its parent is <font size="2"></font> and contains the text I wanted but there is a lot of tags named exactly <font size="2"></font> so I can't just use xpath://font[#size="2"].
Attempt to use the full xpath would print out half of the website content.
the full xpath:
/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/pre/font/table/tbody/tr/td[2]/pre/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font
Is there anyway to select that particular font tag and print the text?
website:
https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm
Tutorial
https://www.youtube.com/watch?v=PXMJ6FS7llk&t=8740s&ab_channel=freeCodeCamp.org
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import pandas as pd
# prepare it to automate
from datetime import datetime
import os
import sys
import csv
application_path = os.path.dirname(sys.executable) # export the result to the same file as the executable
now = datetime.now() # for modify the export name with a date
month_day_year = now.strftime("%m%d%Y") # MMDDYYYY
website = "https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm"
path = "C:/Users/User/PycharmProjects/Automate with Python – Full Course for Beginners/venv/Scripts/chromedriver.exe"
# headless-mode
options = Options()
options.headless = True
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service, options=options)
driver.get(website)
containers = driver.find_element(by="xpath", value='') # or find_elements
hhi = containers.text # if using find_elements, = containers[0].text
print(hhi)
Update:
Thank you to Conal Tuohy, I learn a few new tricks in Xpath. The website is written in a strange way that even with the Xpath that locate the exact font tag, the result would still print all text in every following tags.
I tried to make a list of different products by .split("Back to Top") then slice out the first item and use .split("\n"). I will .split() the lists within list until it can neatly fit into a dataframe with strike prices as index and maturity date as column.
Probably not the most efficient way but it works for now.
product = "HHI"
containers = driver.find_element(by="xpath", value=f'//font[a/#name="{product}"]')
hhi = containers.text.split("Back to Top")
# print(hhi)
hhi1 = hhi[0].split("\n")
df = pd.DataFrame(hhi1)
# print(df)
df.to_csv(f"{product}_{month_day_year}.csv")
You're right that HTML is just awful! But if you're after the text of the table, it seems to me you ought to select the text node that follows the B element that follows the a[#name="HHI"]; something like this:
//a[#name="HHI"]/following-sibling::b/following-sibling::text()[1]
EDIT
Of course that XPath won't work in Selenium because it identifies a text node rather than an element. So your best result is to return the font element that directly contains the //a[#name="HHI"], which will include some cruft (the Back to Top link, etc) but which will at least contain the tabular data you want:
//a[#name="HHI"]/parent::font
i.e. "the parent font element of the a element whose name attribute equals HHI"
or equivalently:
//font[a/#name="HHI"]
i.e. "the font element which has, among its child a elements, one whose name attribute equals HHI"

How to vertically align comma separated values in Notepad++?

As shown in the picture "Before" below, each column separated by comma is not aligned neatedly. Is there any method to align each column vertically like the display effect in Excel?
The effect I wish is shown in the picture "After".
Thanks to #Martin S , I can align the file like the picture "Method_1". As he has mentioned, some characters still cannot align well. I was wondering if this method could be improved?
You can use the TextFX plugin:
TextFX > TextFX Edit > Line up multiple lines by ...
Note: This doesn't work if the file is read only.
http://tomaslind.net/2016/02/18/how-to-align-columns-in-notepad/
Update 2019: Download link from SourceForge
Maybe not exactly what you're looking for, but I recently added a CSV Lint plug-in to Notepad++ which also adds syntax highlighting for csv and fixed width data files, meaning each column gets a different color so it's easier to see.
You can use this python plugin script which utilizes the csv library which takes care of quoted csv and many other variants.
Setup:
Use the plugin manager in Notepad++ to install the "Python script" plugin.
Plugins->Python Script->New Script (name it something like CSVtoTable.py)
Paste the following python script into the new file and save:
CSVtoTable.py
import csv
inputlines = editor.getText().split('\n')
# Get rid of empty lines
inputlines = [line.strip() for line in inputlines if line.strip()]
reader = csv.reader(inputlines, delimiter=',')
csvlist = [line for line in reader]
# transpose to calculate the column widths and create a format string which left aligns each row
t_csvlist = zip(*csvlist)
col_widths = [max([len(x) for x in t_csvlist[y]]) for y in range(len(t_csvlist))]
# To right align - change < to >
fmt_str = ' '.join(['{{:<{0}}}'.format(x) for x in col_widths]) + '\r\n'
text = []
for line in csvlist:
text.append(fmt_str.format(*line))
# open a new document and put the results in there.
notepad.new()
editor.addText(''.join(text))
Open your CSV file in notepad++
Click on Plugins->Python Script->Scripts->(The name you used in step 2)
A new tab with the formatted data should open.
Update (right aligned numbers & left aligned strings):
Use the following python script if you want to right align number fields from the CSV - it looks at the second line of the csv to determine the types of the fields.
import csv
import re
num_re = re.compile('[-\+]?\d+(\.\d+)?')
inputlines = editor.getText().split('\n')
# Get rid of empty lines
inputlines = [line.strip() for line in inputlines if line.strip()]
reader = csv.reader(inputlines, delimiter=',')
csvlist = [line for line in reader]
# Transpose to calculate the column widths and create a format string which left aligns each row
t_csvlist = zip(*csvlist)
col_widths = [max([len(x) for x in t_csvlist[y]]) for y in range(len(t_csvlist))]
# Numbers get right aligned
type_eval_line = csvlist[1 if len(csvlist)>1 else 0]
alignment = ['>' if num_re.match(item) else '<' for item in type_eval_line]
# Compute the format string
fmt_str = ' '.join(['{{:{0}{1}}}'.format(a,x) for x,a in zip(col_widths,alignment)]) + '\r\n'
text = []
for line in csvlist:
text.append(fmt_str.format(*line))
# open a new document and put the results in there.
notepad.new()
editor.addText(''.join(text))
Notepad++ CSVLint
Install CSVLint Plugin
Open CSV file. Or manually set Language > CSVLint. This will give you nicely colored output.
To reformat do this:
Open lower pane: Plugins > CSV Lint > CSV Lint Window.
Click the Reformat button. Check the box Align vertically (not recommended). -- This may screw up your data, so think twice before clicking OK.
Reformatted output:
If you want to try this yourself: Here is my sample input:
TIMESTAMP_START,TIMESTAMP_END,TA_ERA,TA_ERA_NIGHT,TA_ERA_NIGHT_SD,TA_ERA_DAY,DA_ERA_DAY_SD,SW_IN_ERA,HH,DD,WW-YY,SW_IN_F,HH
19890101,19890107,3.436,1.509,2.165,6.134,2.889,100.233,283.946,1.373,99.852,2.748,1.188
19890108,19890114,3.814,2.446,2.014,5.728,2.526,91.708,286.451,1.575,100,100.841,0.742
You could use Search&Replace to change all occurrences of , to ,\t. This will add a tab after each ,.
This method has however some drawbacks:
you effectively add white-space characters to your document (in case you need to edit and save it).
This works well only if the difference (in terms of number of characters) between the longest and the shortest numbers is less than 1 tab-size (usually 4 characters).

In R package Formattable, how to apply digits and conditional formatting at the same time?

I have the object TABLE_LIST which is a list that has tables (I can't provide the contents for privacy policies, sorry).
I first created the object TABLE_LIST (It is a list of data.frames 2x12)
TABLE_LIST=lapply(1:4, function(x) data.frame(rbind(total.ratio4[[x]][-(1)], total.ratio2[[x]][-(1)]), row.names=row))
The following code gives me red and green font colors based on the value on the cell, and it works like a charm:
formattable(TABLE_LIST[[1]], list(area(,-(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"red","green"))),area(,(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"green","red")))))
However, I need COLOR AND comma separated numbers. My failed attempt is:
formattable(TABLE_LIST[[1]], list(area(,-(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"red","green"))),area(,(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"green","red"),digits(x,2))),
area(1:2,1:10)~formatter("span",x~ style(digits(x,2)))))
This code works well, but erases the formatting of the color. I do not know what else to do.
I have to mention I cannot change the original data.frame without messing everything up. So I gotta make the changes on table_list or formattable. Thank you.
I think I solved it. So I will share this small knowledge to people who may have the same problems as me:
formattable(TABLE_LIST[[1]],
list(
area(,-(c(5,10)))~formatter("span",
style=x~style(color=ifelse(x>1,"red","green")),
x~style(digits(x,4))),
area(,(c(5,10)))~formatter("span",
style=x~style(color=ifelse(x>1,"green","red")),
x~style(digits(x,4)))))
Basically, inside the same formatter, on the level of style, add a comma and x~style.