How to vertically align comma separated values in Notepad++? - csv

As shown in the picture "Before" below, each column separated by comma is not aligned neatedly. Is there any method to align each column vertically like the display effect in Excel?
The effect I wish is shown in the picture "After".
Thanks to #Martin S , I can align the file like the picture "Method_1". As he has mentioned, some characters still cannot align well. I was wondering if this method could be improved?

You can use the TextFX plugin:
TextFX > TextFX Edit > Line up multiple lines by ...
Note: This doesn't work if the file is read only.
http://tomaslind.net/2016/02/18/how-to-align-columns-in-notepad/
Update 2019: Download link from SourceForge

Maybe not exactly what you're looking for, but I recently added a CSV Lint plug-in to Notepad++ which also adds syntax highlighting for csv and fixed width data files, meaning each column gets a different color so it's easier to see.

You can use this python plugin script which utilizes the csv library which takes care of quoted csv and many other variants.
Setup:
Use the plugin manager in Notepad++ to install the "Python script" plugin.
Plugins->Python Script->New Script (name it something like CSVtoTable.py)
Paste the following python script into the new file and save:
CSVtoTable.py
import csv
inputlines = editor.getText().split('\n')
# Get rid of empty lines
inputlines = [line.strip() for line in inputlines if line.strip()]
reader = csv.reader(inputlines, delimiter=',')
csvlist = [line for line in reader]
# transpose to calculate the column widths and create a format string which left aligns each row
t_csvlist = zip(*csvlist)
col_widths = [max([len(x) for x in t_csvlist[y]]) for y in range(len(t_csvlist))]
# To right align - change < to >
fmt_str = ' '.join(['{{:<{0}}}'.format(x) for x in col_widths]) + '\r\n'
text = []
for line in csvlist:
text.append(fmt_str.format(*line))
# open a new document and put the results in there.
notepad.new()
editor.addText(''.join(text))
Open your CSV file in notepad++
Click on Plugins->Python Script->Scripts->(The name you used in step 2)
A new tab with the formatted data should open.
Update (right aligned numbers & left aligned strings):
Use the following python script if you want to right align number fields from the CSV - it looks at the second line of the csv to determine the types of the fields.
import csv
import re
num_re = re.compile('[-\+]?\d+(\.\d+)?')
inputlines = editor.getText().split('\n')
# Get rid of empty lines
inputlines = [line.strip() for line in inputlines if line.strip()]
reader = csv.reader(inputlines, delimiter=',')
csvlist = [line for line in reader]
# Transpose to calculate the column widths and create a format string which left aligns each row
t_csvlist = zip(*csvlist)
col_widths = [max([len(x) for x in t_csvlist[y]]) for y in range(len(t_csvlist))]
# Numbers get right aligned
type_eval_line = csvlist[1 if len(csvlist)>1 else 0]
alignment = ['>' if num_re.match(item) else '<' for item in type_eval_line]
# Compute the format string
fmt_str = ' '.join(['{{:{0}{1}}}'.format(a,x) for x,a in zip(col_widths,alignment)]) + '\r\n'
text = []
for line in csvlist:
text.append(fmt_str.format(*line))
# open a new document and put the results in there.
notepad.new()
editor.addText(''.join(text))

Notepad++ CSVLint
Install CSVLint Plugin
Open CSV file. Or manually set Language > CSVLint. This will give you nicely colored output.
To reformat do this:
Open lower pane: Plugins > CSV Lint > CSV Lint Window.
Click the Reformat button. Check the box Align vertically (not recommended). -- This may screw up your data, so think twice before clicking OK.
Reformatted output:
If you want to try this yourself: Here is my sample input:
TIMESTAMP_START,TIMESTAMP_END,TA_ERA,TA_ERA_NIGHT,TA_ERA_NIGHT_SD,TA_ERA_DAY,DA_ERA_DAY_SD,SW_IN_ERA,HH,DD,WW-YY,SW_IN_F,HH
19890101,19890107,3.436,1.509,2.165,6.134,2.889,100.233,283.946,1.373,99.852,2.748,1.188
19890108,19890114,3.814,2.446,2.014,5.728,2.526,91.708,286.451,1.575,100,100.841,0.742

You could use Search&Replace to change all occurrences of , to ,\t. This will add a tab after each ,.
This method has however some drawbacks:
you effectively add white-space characters to your document (in case you need to edit and save it).
This works well only if the difference (in terms of number of characters) between the longest and the shortest numbers is less than 1 tab-size (usually 4 characters).

Related

Selenium, using find_element but end up with half the website

I finished the linked tutorial and tried to modify it to get somethings else from a different website. I am trying to get the margin table of HHI but the website is coded in a strange way that I am quite confused.
I find the child element of the parent that have the text with xpath://a[#name="HHI"], its parent is <font size="2"></font> and contains the text I wanted but there is a lot of tags named exactly <font size="2"></font> so I can't just use xpath://font[#size="2"].
Attempt to use the full xpath would print out half of the website content.
the full xpath:
/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/pre/font/table/tbody/tr/td[2]/pre/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font
Is there anyway to select that particular font tag and print the text?
website:
https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm
Tutorial
https://www.youtube.com/watch?v=PXMJ6FS7llk&t=8740s&ab_channel=freeCodeCamp.org
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import pandas as pd
# prepare it to automate
from datetime import datetime
import os
import sys
import csv
application_path = os.path.dirname(sys.executable) # export the result to the same file as the executable
now = datetime.now() # for modify the export name with a date
month_day_year = now.strftime("%m%d%Y") # MMDDYYYY
website = "https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm"
path = "C:/Users/User/PycharmProjects/Automate with Python – Full Course for Beginners/venv/Scripts/chromedriver.exe"
# headless-mode
options = Options()
options.headless = True
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service, options=options)
driver.get(website)
containers = driver.find_element(by="xpath", value='') # or find_elements
hhi = containers.text # if using find_elements, = containers[0].text
print(hhi)
Update:
Thank you to Conal Tuohy, I learn a few new tricks in Xpath. The website is written in a strange way that even with the Xpath that locate the exact font tag, the result would still print all text in every following tags.
I tried to make a list of different products by .split("Back to Top") then slice out the first item and use .split("\n"). I will .split() the lists within list until it can neatly fit into a dataframe with strike prices as index and maturity date as column.
Probably not the most efficient way but it works for now.
product = "HHI"
containers = driver.find_element(by="xpath", value=f'//font[a/#name="{product}"]')
hhi = containers.text.split("Back to Top")
# print(hhi)
hhi1 = hhi[0].split("\n")
df = pd.DataFrame(hhi1)
# print(df)
df.to_csv(f"{product}_{month_day_year}.csv")
You're right that HTML is just awful! But if you're after the text of the table, it seems to me you ought to select the text node that follows the B element that follows the a[#name="HHI"]; something like this:
//a[#name="HHI"]/following-sibling::b/following-sibling::text()[1]
EDIT
Of course that XPath won't work in Selenium because it identifies a text node rather than an element. So your best result is to return the font element that directly contains the //a[#name="HHI"], which will include some cruft (the Back to Top link, etc) but which will at least contain the tabular data you want:
//a[#name="HHI"]/parent::font
i.e. "the parent font element of the a element whose name attribute equals HHI"
or equivalently:
//font[a/#name="HHI"]
i.e. "the font element which has, among its child a elements, one whose name attribute equals HHI"

Flextable : using superscript in the dataframe

This question was asked few times, but surprinsingly, no answer was given.
I want some numbers in my dataframe to appear in superscript.
The functions compose and display are not suitable here since I don't know yet which values in my dataframe will appear in superscript (my tables are generated automatically).
I tried to use ^8^like for kable, $$10^-3$$, paste(expression(10^2)), "H\\textsubscript{123}", etc.
Nothing works !! Help ! I pull out my hair...
library(flextable)
bab = data.frame(c( "10\\textsubscript{-3}",
paste(as.expression(10^-3)), '10%-3%', '10^-2^' ))
flextable(bab)
I am knitting from Rto html.
In HTML, you do superscripts using things like <sup>-3</sup>, and subscripts using <sub>-3</sub>. However, if you put these into a cell in your table, you'll see the full text displayed, it won't be interpreted as HTML, because flextable escapes the angle brackets.
The kable() function has an argument escape = FALSE that can turn this off, but flextable doesn't: see https://github.com/davidgohel/flextable/issues/156. However, there's a hackish way to get around this limitation: replace the htmlEscape() function with a function that does nothing.
For example,
```{r}
library(flextable)
env <- parent.env(loadNamespace("flextable")) # The imports
unlockBinding("htmlEscape", env)
assign("htmlEscape", function(text, attribute = FALSE) text, envir=env)
lockBinding("htmlEscape", env)
bab = data.frame(x = "10<sup>-3</sup>")
flextable(bab)
```
This will display the table as
Be careful if you do this: there may be cases in your real tables where you really do want HTML escapes, and this code will disable that for the rest of the document. If you execute this code in an R session, it will disable escaping for the rest of the session.
And if you were thinking of using a document like this in a package you submit to CRAN, forget it. You shouldn't be messing with bindings like this in code that you expect other people to use.
Edited to add:
In fact, there's a way to do this without the hack given above. It's described in this article: https://davidgohel.github.io/flextable/articles/display.html#sugar-functions-for-complex-formatting. The idea is to replace the entries that need superscripts or subscripts with calls to as_paragraph, as_sup, as_sub, etc.:
```{r}
library(flextable)
bab <- data.frame(x = "dummy")
bab <- flextable(bab)
bab <- compose(bab, part = "body", i = 1, j = 1,
value = as_paragraph("10",
as_sup("-3")))
bab
```
This is definitely safer than the method I gave.

Stop jupyter notebook wrapping cell contents in pandas html table output

The pandas option max_colwidth controls how many characters will be included in the repr of a dataframe:
import string, random
import pandas as pd
df = pd.DataFrame([''.join(random.choice(string.ascii_lowercase + ' ') for j in range(1000)) for i in range(4)])
pd.options.display.max_colwidth = 10
print(df)
yields
0
0 lmftge...
1 pqttqb...
2 wi wgy...
3 ow dip...
and
pd.options.display.max_colwidth = 30
print(df)
yields
0
0 lmftgenioerszvgzfaxorzciow...
1 pqttqbqqe pykgguxnjsspbcti...
2 wi wgybtgcbxkobrwnaxpxwsjc...
3 ow dippaiamvvcofvousieckko...
And you can set pd.options.display.max_colwidth = 0 to remove the limit altogether. Fine so far!
But if the dataframe is rendered in HTML inside a notebook, the notebook will wrap the table of the column to the width of the display, regardless of this setting:
Is there any way to avoid this, i.e. to have the HTML table column rendered as wide as is necessary to fit the each row on a single line?
More generally, is it possible to control the width of HTML table columns in notebook output independent of the number of characters in the pandas output?
Building on Ben's answer, but without needing to go into the custom css files, which work differently for juptyter lab.
Just put this in a cell and run it:
%%html
<style>
.dataframe td {
white-space: nowrap;
}
</style>
If you make a file: ~/.jupyter/custom$ atom custom.css and then put this in it:
.dataframe td {
white-space: nowrap;
}
Then it will force the cell to show as one line, but then you get a scrolling table.
If you want it to not scroll, then set:
div.output_subarea {
overflow-x: inherit;
}
and then it'll be as wide as it needs to be:
It's not super pretty, but I'm sure that you can tidy it up if needs be.
I found this very helpful. You'll also need to restart the notebook after you first create the css file for it to register, but from then on you can just refresh the page to see the changes to the css take effect.
This was the notebook that I was testing on.
Another option if you prefer to do it with 1 line of code or if you need to have different formats in different parts of your notebook, is to use Pandas Styler:
dfs = df.style.set_table_styles([dict(selector="td", props=[('white-space', 'nowrap')])])
display(dfs)
It is based on CSS so if in the middle of the notebook you want to go back to the previous format, you can write:
dfs = df.style.set_table_styles([dict(selector="td", props=[('overflow', 'hidden'),
('text-overflow', 'ellipsis'), ('max-width', '120px')])])

as3 remove white space

I am trying to remove / replace white space from a string in as3. The string comes from xml and than written into text field. to compare the strings I am trying to remove white spaces
var xmlSentence:String=myXML.SENTENCE[thisSentence];
var tfSentence=e.target.text;
var rex:RegExp = /\s+/;
trace(xmlSentence.replace(rex, "-"));
trace(tfSentence.replace(rex, "-"));
That code outputs like this:
She-has a dog
-She has a dog
I also tried different rex patterns. the problem is that though there are spaces in both string -which are same- it finds only one space but not the same one in both strings.
Could you help me to solve this problem
Thanks in advance
You need to use the g flag to indicate recursive changes
var rex:RegExp = /\s+/g ;
Within your Actionscript code, select the RegExp keyword, then goto the 'Help' menu and choose 'Flash Help' for more info on flags.

Crystal Report CSV export adding blank cells and unable to add quotes to numbers

A client has requested a specific CSV format for their report. When exported to CSV, blank cells are being added to the group header.
The comma's in the dollar values are also being seen as delimiter, so I need to figure out a way to add quotes around them. I have tried ToText, but the formula check keeps stating that the remaining text does not appear to be part of the formula.
Here is the formula
if{bnkacrpt.trans_type}<>"OB"
then if {bnkacrpt.amount} > 0
then {bnkacrpt.amount}
else if {bnkacrpt.amount} < 0
then ({bnkacrpt.amount} * -1)
else 0
else 0
The layout (This is the only active component in the report):
The export options:
The result:
What they want:
I have tried many different variations on the export options, but am having no luck. Any help is much appreciated!
Sorry this isn't a complete answer, I would comment if I could.
Totext needs parenthesis around the text you're converting.
Try this to at least get the formula working.
if {bnkacrpt.trans_type} <> "OB"
then if {bnkacrpt.amount} > 0
then ToText({bnkacrpt.amount})
else if {bnkacrpt.amount} < 0
then ToText({bnkacrpt.amount} * -1)
else ToText(0)
else ToText(0)
OK. So I had to basically hack the living daylights out of it and just make it work.
Formula (there is another formula for the description column that I did similar stuff to):
stringVar amount := totext({bnkacrpt.amount});
stringVar lessamount := totext({bnkacrpt.amount} * -1);
if{bnkacrpt.trans_type}<>"OB"
then if amount > totext(0)
then '"'+amount+'",'
else if amount < totext(0)
then '"'+lessamount+'",'
else totext(0)+','
else totext(0)+',';
In the Layout as shown in my original post, I added a single comma to the blank fields in between the data fields on the description line.
The export options:
The result:
I just need to figure out how to get rid of the empty line on top and add empty lines in between the results.
So. Close.
I also hate this client very much now.
EDIT
Finally got it right! Well, from my point of view anyways.
To add the blank lines between the details results, and under the headers line, I added a formula only with "ChrW(13)" in it and put it at the end of both lines.
/*****************************************/
To remove the most annoying blank line at the top of the resulting CSV file, I went through all the headers that were suppressed above using the section expert and ticked "Hide", 'Suppress' and Suppress blank section. Not sure which was the culprit, or it could have been all of them, but it works, so I don't care :)