Create word fields in python-docx to automatize footer from upon page 2 of the word document (e.g. # Page / # Total Pages)? - python-docx

I'm trying to automatize a word - file generation via python docx. I created a template with the first page and its specific footers and headers; now I only need to implement this footer, starting from upon the second page:
Name Page # / Total Pages Date
all I want to know is how to configure this footer for all pages starting from the second, without influencing the headers or footers of the first page of the template. My problem is: Can I create word fields via python docx (e.g. the Pages and NumPages fields for the midpart; that's the only problem)..?
If not, can I somehow only insert the "Name" and the "Date" part into the footers from upon page 2 of the template without affecting the mid-part which I would then prepare in the template as well... ?
Attempted code on a template already containing the Pag/NumPag fields in the middle of the footer deleted the middle part (Result was a file where "Hello" was written in the lower left corner in the footer, and the Pag/NumPag of the template (Hello.docx) disappeared:
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document("Hello.docx")
footer = doc.sections[0].footer
parag = footer.paragraphs[0]
parag.text = "Hello"
parag.alignment = WD_ALIGN_PARAGRAPH.LEFT
parag.style = doc.styles["Footer"]
doc.save("Hello2.docx")

Related

Selenium, using find_element but end up with half the website

I finished the linked tutorial and tried to modify it to get somethings else from a different website. I am trying to get the margin table of HHI but the website is coded in a strange way that I am quite confused.
I find the child element of the parent that have the text with xpath://a[#name="HHI"], its parent is <font size="2"></font> and contains the text I wanted but there is a lot of tags named exactly <font size="2"></font> so I can't just use xpath://font[#size="2"].
Attempt to use the full xpath would print out half of the website content.
the full xpath:
/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/pre/font/table/tbody/tr/td[2]/pre/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font
Is there anyway to select that particular font tag and print the text?
website:
https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm
Tutorial
https://www.youtube.com/watch?v=PXMJ6FS7llk&t=8740s&ab_channel=freeCodeCamp.org
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import pandas as pd
# prepare it to automate
from datetime import datetime
import os
import sys
import csv
application_path = os.path.dirname(sys.executable) # export the result to the same file as the executable
now = datetime.now() # for modify the export name with a date
month_day_year = now.strftime("%m%d%Y") # MMDDYYYY
website = "https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm"
path = "C:/Users/User/PycharmProjects/Automate with Python – Full Course for Beginners/venv/Scripts/chromedriver.exe"
# headless-mode
options = Options()
options.headless = True
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service, options=options)
driver.get(website)
containers = driver.find_element(by="xpath", value='') # or find_elements
hhi = containers.text # if using find_elements, = containers[0].text
print(hhi)
Update:
Thank you to Conal Tuohy, I learn a few new tricks in Xpath. The website is written in a strange way that even with the Xpath that locate the exact font tag, the result would still print all text in every following tags.
I tried to make a list of different products by .split("Back to Top") then slice out the first item and use .split("\n"). I will .split() the lists within list until it can neatly fit into a dataframe with strike prices as index and maturity date as column.
Probably not the most efficient way but it works for now.
product = "HHI"
containers = driver.find_element(by="xpath", value=f'//font[a/#name="{product}"]')
hhi = containers.text.split("Back to Top")
# print(hhi)
hhi1 = hhi[0].split("\n")
df = pd.DataFrame(hhi1)
# print(df)
df.to_csv(f"{product}_{month_day_year}.csv")
You're right that HTML is just awful! But if you're after the text of the table, it seems to me you ought to select the text node that follows the B element that follows the a[#name="HHI"]; something like this:
//a[#name="HHI"]/following-sibling::b/following-sibling::text()[1]
EDIT
Of course that XPath won't work in Selenium because it identifies a text node rather than an element. So your best result is to return the font element that directly contains the //a[#name="HHI"], which will include some cruft (the Back to Top link, etc) but which will at least contain the tabular data you want:
//a[#name="HHI"]/parent::font
i.e. "the parent font element of the a element whose name attribute equals HHI"
or equivalently:
//font[a/#name="HHI"]
i.e. "the font element which has, among its child a elements, one whose name attribute equals HHI"

Can I add Heading text to the footer in a Google Docs file automatically?

I have a document that consists of approximately 20 chapters, divided over 60 pages. Each new chapter starts at the top of a new page. What I would like to do, is to automatically add the active Chapter title to the footer of that page. I know this behavior is possible in Microsoft Word, but I can not find it in Google Docs.
It can be done manually by inserting section breaks, but that is inconvenient for me, since I want to use this process in over 1.000 different documents.
Example:
Chapter 1 is called "Test chapter" and starts at page 1
Chapter 2 is called "Another chapter" and starts at page 4
Then on page 1, 2 and 3 the footer of the page should contain the text "Test Chapter". On page 4, the footer should contain the text "Another chapter".
Thank you in advance!
Unfortunately, this still isn't available as of the moment and has no update since the issue of the bug.
I have been trying to circumvent this issue via Apps Script and never had a proper solution as the footers and headers are being treated as one.
By default, changing a footer section will apply to all footer sections, and if they were separated by section break, they are not ordered by page but by the order of them being added on the document so it would be tricky and would require you to add them in this order for my code to work.
Object Order:
After BODY_SECTION, next children of the document should be alternating HEADER_SECTION then FOOTER_SECTION.
NOTE:
Logger.log part of the code shows what is the object order of your document. If it looks like the above order, then the code below will work.
Code:
function setHeaderAsFooter() {
var d = DocumentApp.getActiveDocument();
var p = d.getBody().getParent();
for ( var i = 0; i < p.getNumChildren(); i += 1 ) {
var c = p.getChild(i);
var t = c.getType();
// Check what type is the object
// Comment out if-block below to see if what is the object order in your document
Logger.log(t);
// Every header you encounter, set it as footer value of the next object
if ( t === DocumentApp.ElementType.HEADER_SECTION) {
var f = c.getNextSibling();
f.asFooterSection().setText(c.asHeaderSection().getText());
}
}
}
If they aren't in HEADER -> FOOTER alternating order after the BODY, then the code above won't work.
Most likely, this is not the answer you were hoping for, so I do sincerely apologize. I hope that this answer will help you in any way at least.

Weasyprint not printing the pages spanning over more than one page

strong text
this is the code I am using to convert a rendered html page to pdf.
html_string = render_to_string("monthly_report/generated_pdf.html",
{'form': form, 'month': monthStr.strftime("%B %Y")})
html = HTML(string=html_string)
main_doc = html.render()
pdf = main_doc.write_pdf()
return HttpResponse(pdf, content_type='application/pdf')
The row having data spanning over more than one page is being truncated and data is getting lost.
What I want is the remaining data should be printed on the next page.
How it can be achieved ???
Please help, I am stuck over it from yesterday.
This problem is caused by bug #36, WeasyPrint is not able to split table cells yet.

SSRS Repeating certain groups and pages

I'm struggling with a problem in SSRS. I have created a customer invoice that is looking good in report viewer however, it needs to be set to print in a certain way.
There are 4 main elements to this report.
Header, this needs to repeat on every other page if the invoice details + footer do not fit on the first page.
Invoice details, this needs to repeat on every other page if the invoice details and footer do not fit on the first page.
Footer, this needs to repeat on every other page if the invoice details and footer do not fit on the first page.
Back of page (payment details, like a bank statement), this needs to repeat on every other page without the header, invoice details or footer.
Is this even possible? If not, the end user has accepted that the first 3 parts of the invoice to repeat as necessary and just the last page to be the payment details.
Thanks in advance
Getting the Report Header and Footer to repeat on every page should be pretty straight forward.
Now if you have some additional information outside of the report content you wish to repeat on every page you could do the following:
As you are probably already aware, when using a Tablix it's possible to repeat table header rows on each page . This can be used to our advantage by adding Tablix with a single column and making it span the size of the page, in both the header and data rows you add rectangles so it acts like the report body. In the header row you can add any data/text you wish to repeat on the next pages.
Now as you want the back-side of the pages to have text on them, you probably don't want this to repeat on every page. Because the back of the pages is always the same static data, you could simply generate your report the way it's set up right now and insert the static page between the pages of the report.
To achive the last part you could use some code like this:
String inputFilePath1 = #""; //back of page
String inputFilePath2 = #""; //report
String outPutFilePath = #""; //final report
PDFDocument doc1 = new PDFDocument(inputFilePath1);
PDFDocument doc2 = new PDFDocument(inputFilePath2);
// Get a page from the first document. -> back of page
PDFPage page = (PDFPage)doc1.GetPage(0);
for(int i = 1; i <= doc2.PageCount; i++)
{
if (i % 2 == 1)
{
// Insert the page to the second document at specified position.
doc2.InsertPage(page, i);
}
}
// Output the new document.
doc2.Save(outPutFilePath)

VBA Word Asynchronous Execution

I have a form in MS Access that allows users to enter data they collect from the field and that form also has the option to compile all of the information into a formal report. The report contains a cover sheet and a table of contents as well as leaves section header pages for additional documents to be attached when printed out/exported.
There are 2 things that execute before their processes are actually finished:
One subroutine creates many formatted tables but the tables only get created with the appropriate data, the formatting does not apply right away and as a result, the formatting finally kicks in once the document is done typing and will delete any extra pages. This is affecting the second problem.
Since the page numbering for each page is not the same, sections are used so that each page can have a unique footer with the page number included in that. A loop is used to run through the document and unlink all headers and footers from the previous ones. It then starts from the beginning of the document and moves from footer to footer and writes the page number. That code is below:
While Not Selection.Information(wdActiveEndPageNumber)
If Selection.Information(wdActiveEndPageNumber) = (Section_Page + 1) Then
Selection.TypeText "Page: " & (pgNum + Section_Length)
pgNum = pgNum + Section_Length
Else
Selection.TypeText "Page: " & pgNum
End If
pgNum = pgNum + 1
ActiveWindow.ActivePane.View.NextHeaderFooter 'move to the next page's footer
Wend
The problem that I am having with this section of code is that the Selection does not always move to the next footer fast enough and as a result footers that belong on the next page sometimes cram onto the same page as another footer and the footer looks something like "Page: 5Page: 6" rather than "Page: 5" on one and "Page: 6" on the next.
Please do not suggest the built in Word page numbering - I shortened the code here, there are anywhere between 3 and 7 sections that need spacing. I think if there was a way to get the code to execute asynchronously that block of code will work.
A stop-gap measure would be to insert one (or several) line(s) of
DoEvents
after the change and before ActiveWindow....NextHeaderFooter. That command yields execution to the OS. That may give Word the time it needs to catch up.
Of course, you would do better to avoid using ActiveWindow... altogether and iterate through the sections with a For loop.