in python-docx, I try to read the font size, but I get None - python-docx

My task is to check several docx documents for font properties, like font name, size, boldness, italic etc. I do not want to change anything, just validate if there are correct values.
I created a small test document, with text "Foo-bar" and I have updated the font name to Garamond from the default, but have not changed the font size.
Now I am able to read the font name from the document, but when I try to read the font size, I get None.
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
def check_para_props(doc, paralist, ffamily, fsize, paligin):
pargs = doc.paragraphs
for i in paralist:
para = pargs[i]
print(i, para.text)
for run in para.runs:
print(run.font.name, run.font.size)
worddoc = Document('Foo-bar.docx')
# pass the document, the paragraphs list which should be tested
# and the expected values
check_para_props(worddoc, (0, ), 'Garamond', 11, WD_ALIGN_PARAGRAPH.JUSTIFY)
Output:
0 Foo-bar
Garamond **None**
I assume, there is some default values in docx for the unchanged values, like font-size, but have not found, how could I read them from the document. In python-docx documentation I have not found an API call for it: https://python-docx.readthedocs.io/en/latest/index.html
I have checked the style section, which is readable, but I don't think, this is set there.
Checked API documentation: https://python-docx.readthedocs.io/en/latest/index.html
Word is able to export document in XML format, but still I have doubts how to read the font size.
In that article, I read the about the inheritance of the properties: https://www.toptal.com/xml/an-informal-introduction-to-docx
It tells, to get the end result of a character’s properties you should:
Use default run/paragraph properties
Append run/paragraph style properties
Append local run/paragraph properties
Append result run properties over paragraph properties
Ok, still how may I read the default properties using python-docx?
Please advise!
Thank you in advance!

Related

Selenium, using find_element but end up with half the website

I finished the linked tutorial and tried to modify it to get somethings else from a different website. I am trying to get the margin table of HHI but the website is coded in a strange way that I am quite confused.
I find the child element of the parent that have the text with xpath://a[#name="HHI"], its parent is <font size="2"></font> and contains the text I wanted but there is a lot of tags named exactly <font size="2"></font> so I can't just use xpath://font[#size="2"].
Attempt to use the full xpath would print out half of the website content.
the full xpath:
/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/pre/font/table/tbody/tr/td[2]/pre/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font
Is there anyway to select that particular font tag and print the text?
website:
https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm
Tutorial
https://www.youtube.com/watch?v=PXMJ6FS7llk&t=8740s&ab_channel=freeCodeCamp.org
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import pandas as pd
# prepare it to automate
from datetime import datetime
import os
import sys
import csv
application_path = os.path.dirname(sys.executable) # export the result to the same file as the executable
now = datetime.now() # for modify the export name with a date
month_day_year = now.strftime("%m%d%Y") # MMDDYYYY
website = "https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm"
path = "C:/Users/User/PycharmProjects/Automate with Python – Full Course for Beginners/venv/Scripts/chromedriver.exe"
# headless-mode
options = Options()
options.headless = True
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service, options=options)
driver.get(website)
containers = driver.find_element(by="xpath", value='') # or find_elements
hhi = containers.text # if using find_elements, = containers[0].text
print(hhi)
Update:
Thank you to Conal Tuohy, I learn a few new tricks in Xpath. The website is written in a strange way that even with the Xpath that locate the exact font tag, the result would still print all text in every following tags.
I tried to make a list of different products by .split("Back to Top") then slice out the first item and use .split("\n"). I will .split() the lists within list until it can neatly fit into a dataframe with strike prices as index and maturity date as column.
Probably not the most efficient way but it works for now.
product = "HHI"
containers = driver.find_element(by="xpath", value=f'//font[a/#name="{product}"]')
hhi = containers.text.split("Back to Top")
# print(hhi)
hhi1 = hhi[0].split("\n")
df = pd.DataFrame(hhi1)
# print(df)
df.to_csv(f"{product}_{month_day_year}.csv")
You're right that HTML is just awful! But if you're after the text of the table, it seems to me you ought to select the text node that follows the B element that follows the a[#name="HHI"]; something like this:
//a[#name="HHI"]/following-sibling::b/following-sibling::text()[1]
EDIT
Of course that XPath won't work in Selenium because it identifies a text node rather than an element. So your best result is to return the font element that directly contains the //a[#name="HHI"], which will include some cruft (the Back to Top link, etc) but which will at least contain the tabular data you want:
//a[#name="HHI"]/parent::font
i.e. "the parent font element of the a element whose name attribute equals HHI"
or equivalently:
//font[a/#name="HHI"]
i.e. "the font element which has, among its child a elements, one whose name attribute equals HHI"

How to add auto-complete Sublime Text 3

I would like to add custom auto-complete key bindings much like built-in:
Example: html+tab auto-completes the Doctype Block.
I tried adding html custom key binding: type c + o + l + tab to generate <div class="col-">
Preferences > Key Bindings > Default (OSX).sublime-keymap -- User
{"keys": ["c+o+l+tab"], "command": "insert_snippet", "args": {"contents": "<div class=\"col-$0\">"}},
However, two issues:
the new key binding overrides all other auto completes
the initial col or characters remains in front of the
generated tag. col<div class="col-">
What is the correct way to add this type of key binding?
The correct way to do something like this is to use either snippets or completions. Although there are some differences, generally speaking they both work the same way in the end, and which one you choose depends on how many such items you want to create and how complex you want them to be.
Using a snippet, you would select Tools > Developer > New Snippet... from the menu and fill out the snippet template, then save it as a sublime-snippet file in the location that Sublime defaults to (which is your User package).
For example, that might look like the following based on the example in your question:
<snippet>
<content><![CDATA[
<div class="col-$0">
]]></content>
<description>Insert DIV with column class</description>
<tabTrigger>col</tabTrigger>
<scope>text.html</scope>
</snippet>
Snippets are XML formatted, and everything between ![CDATA[ and ]] is inserted into the buffer (don't remove the CDATA even if you think you don't need it; Sublime will ignore the snippet if you do).
The tabTrigger specifies the text that you want to be the trigger for the snippet, the scope says what sort of files the snippet should trigger in, and the description will be displayed next to the snippet in the auto-completions panel.
In a snippet, the tabTrigger, scope and description are all optional. If you don't specify a tabTrigger you can only expand the snippet from the Command Palette or via the insert_snippet command (for example in a key binding). Without a scope the snippet applies everywhere, and without description it has no description in the panel.
If you have many such items that you want to add snippets for, you can also use completions instead. These are stored in JSON files with an extension of sublime-completions and should be saved in your User package (use Preferences > Browse Packages... if you don't know where that is.
An example of such a file would be:
{
"scope": "text.html",
"completions": [
{ "trigger": "col\tInsert DIV with column class", "contents": "<div class=\"col-$0\">" },
]
}
In this format, the trigger is always the text to trigger and the description (still optional) is separated from the trigger by a \t character in the trigger key.
In completions you only specify the scope once at the top instead of every time, but there are some functional differences between completions and snippets.
There can only be one snippet per sublime-snippet file, but a sublime-completions file can contain many completions in a single file; the completions key is an array so you can place more than one completion in the same file.
Completions are JSON, so contents that are multi line or contain JSON specific characters such as a " character are harder to enter; completions are better for shorter sequences while snippets are better for more complex things.
When autocomplete triggers, if there is a completion and a snippet that could be autocompleted, snipptets always "win" and are inserted, whereas completions cycle. That means that for example in this particular example you need to press Tab twice because col is also the name of a tag.
Snippets automatically appear in the command palette (when they apply) but completions do not. In the command palette, Snippets appear as commands like Snippet: Something, where Something is the description if it exists and the name of the file if it does not.
In either case, you can make the snippet/completion apply only in certain types of files by applying a scope; to determine the appropriate scope, position the cursor in a file at the appropriate place and select Tools > Developer > Show Scope Name...; the more of the displayed scope you use the more specific it becomes. Generally just the top level such as text.html is all that's needed unless you're doing something special.

In R package Formattable, how to apply digits and conditional formatting at the same time?

I have the object TABLE_LIST which is a list that has tables (I can't provide the contents for privacy policies, sorry).
I first created the object TABLE_LIST (It is a list of data.frames 2x12)
TABLE_LIST=lapply(1:4, function(x) data.frame(rbind(total.ratio4[[x]][-(1)], total.ratio2[[x]][-(1)]), row.names=row))
The following code gives me red and green font colors based on the value on the cell, and it works like a charm:
formattable(TABLE_LIST[[1]], list(area(,-(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"red","green"))),area(,(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"green","red")))))
However, I need COLOR AND comma separated numbers. My failed attempt is:
formattable(TABLE_LIST[[1]], list(area(,-(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"red","green"))),area(,(c(5,10)))~formatter("span", style=x~style(color=ifelse(x>1,"green","red"),digits(x,2))),
area(1:2,1:10)~formatter("span",x~ style(digits(x,2)))))
This code works well, but erases the formatting of the color. I do not know what else to do.
I have to mention I cannot change the original data.frame without messing everything up. So I gotta make the changes on table_list or formattable. Thank you.
I think I solved it. So I will share this small knowledge to people who may have the same problems as me:
formattable(TABLE_LIST[[1]],
list(
area(,-(c(5,10)))~formatter("span",
style=x~style(color=ifelse(x>1,"red","green")),
x~style(digits(x,4))),
area(,(c(5,10)))~formatter("span",
style=x~style(color=ifelse(x>1,"green","red")),
x~style(digits(x,4)))))
Basically, inside the same formatter, on the level of style, add a comma and x~style.

Displaying and intrepreting tab layouts

I would like to know if it is possible to 'see' and display the following tab layout maybe through the Attribute Editor etc?
Or how can I interpret it?
In the following, I selected the shader - ShaderParam_resGen_srf01 but after searching through every attributes I can find in the Attribute Editor, I can neither find the CachedLayouts or the ShaderParamTabDepth elements.
Any ideas?
tabLayout -e -selectTabIndex 1"MayaWindow|MainAttributeEditorLayout|formLayout2|AEmenuBarLayout|AErootLayout|AEStackLayout|AErootLayoutPane|AEbaseFormLayout|AEcontrolFormLayout|AttrEdrexShaderSrfFormLayout|scrollLayout121|columnLayout971|frameLayout522|columnLayout976|columnLayout977|MW_ShaderParam_CachedLayouts|MW_ShaderParam_resGen_srf01|ShaderParamTabDepth0";
tabLayout is a UI element, not part of your scene.
From the documentation, this command is selecting the first tab of the specified tab layout control.
The long string is the "path" to the control:
MayaWindow
MainAttributeEditorLayout
formLayout2
AEmenuBarLayout
AErootLayout
AEStackLayout
AErootLayoutPane
AEbaseFormLayout
AEcontrolFormLayout
AttrEdrexShaderSrfFormLayout
scrollLayout121
columnLayout971
frameLayout522
columnLayout976
columnLayout977
MW_ShaderParam_CachedLayouts
MW_ShaderParam_resGen_srf01
ShaderParamTabDepth0
Depending on what you intend by "interpreting tab layouts," other commands listed in the documentation linked above should help you collect the specific information you need. If there's a particular aspect of the layout you want to query, be sure to specify that in your question.

Set TCAdefaults in TYPO3 Page TS config for a specific page type only

I can disable fields in the Page TS Config depending on the selected content type. For example:
TCEFORM.tt_content.header.types.gridelements_pi1.disabled = 1
When I try to set a default value for a specific content type this does not work:
TCAdefaults.tt_content.header_layout.types.gridelements_pi1 = 100
Does anybody know how to achive, settings a default value for a specific page type?
It seems not possible to set the default value for a specific content element. PageTS reference doesn't mention such an option.
It is possible to manipulate the options array to some extent:
See here for possible actions: http://docs.typo3.org/TYPO3/TSconfigReference/PageTsconfig/TCEform/Index.html#pagetceformconfobj
You could experiment with the option itemsProcFunc but I have no idea if it'S useful.