I'm getting errors in documents generated with python-docx, specifically if I include tables from a template - python-docx

I am using python-docx to programmatically insert data into a new document. When opening the new file, I get the following error message.
Word found unreadable content in document_name. Do you want to recover the contents of this document? If you trust the source of this document, click Yes.
Here is the process that my code is going through to get to this point:
Copy a docx file that we will call our findings templates to a
working folder
Copy another docx file that is our report document to the same working folder
Locate a table in our findings document that we want to include in the report
Fill in some data in the table, and put the now completed table into the report document.
Save the report document as a new file, called generated.docx
What I have figured out so far:
If I don't fill in any information in the table, and just copy it
from the findings templates into the report, I still get the above
error message.
If I insert other data into the report without the
table from the findings templates the document is all good with no
errors.
The source files have no errors, at least Word doesn't complain when opening either the findings document or the report document.
If I let Word correct the errors, all hyperlinks in the document are broken, the text for the link is there along with the link style, but the target is missing, and when looking at the document after hitting alt+F9, you can see { HYPERLINK } indicating the missing target as well.
After quite a bit of googling and finding some similar answers that haven't resolved the issue, I feel like this might be relevant. The tables in the findings document contain a large number of merged cells. It is only one table, not nested tables as I initially thought they were.
Heading is 2 rows deep with 4 merged cells on the left for the finding title and then on the right are two columns with headings and relevant data below. Then the body of the table is a mixture of merged cells per row. Some rows will have all cells merged, others with have 2 cells merged out of 3.
Here is the code I am using to snag the table from the findings document:
for table in findings_templates.tables:
row = table.rows[0]
for cell in row.cells:
if title.lower() in cell.text.lower():
severity = get_severity_from_template(table)
for item in severity_array:
if severity in item[1]:
anchor = item[0]
# snip
# Insert some data into table here
# snip
addTableAfterParagraph(report_document, table, title)
return True
Since the errors occur with our without modification, ill leave out the modification code. Here is the code that inserts the table into the template document:
def addTableAfterParagraph(report_document, table, title):
for para in report_document.paragraphs:
if para.text == title:
p = para._p
p.addnext(table._tbl)
Additionally, I added some print lines for table._tbl.xml and I don't see much of a difference between the source table and the one inserted into the document except for the first line has a few differing xmlns tags.
I'd love some troubleshooting tips, or any suggestions. Let me know if any more information is needed. Thanks in advance!
UPDATE: It's the hyperlinks in the source table that are causing the issue. I'm marking this solved for now and may open another more specific question if I can't figure it out.

I ended up reading data from the source document tables, then creating my own tables programmatically, and inserting that data back in along with performing any transforms, such as creating hyperlinks, styles, etc.
It was painful, but ultimately solved the issue and provides flexibility in the future.

Related

Upload Text File Containing Long List of Tags to WordPress

I have a text file with about 6700 tags that I would like to add to my Word Press site. Of course, it is not efficient to do this manually. Is it possible to automate the insertion of these tags?
I tried a few plugins like Smart Tag Insert, but these are ineffective and have low review scores. Additionally, I see that tags in my MyPHPAdmin panel are stored in the table wp_terms. I wanted to write an SQL script that does what I need. However, the table also stores a series of other values (like menu names). There is no way to identify rows in this table as tags and not something else (like the name of a menu). So, I am also confused about that too.
Thank you for your time and help!
You should loop through all the tags in the file and then use the wp_insert_term function to insert the tags into the database.
Documentation : https://developer.wordpress.org/reference/functions/wp_insert_term/
So the first argument is the tag term and the second one is post_tag.
Exemple:
wp_insert_term(
'tag_name', // the term
'post_tag', // the taxonomy
);

SSIS Errors for simple CSV Data Flow

Sorry to darken your day with my troubles, but SSIS has broken me! I am new to SSIS and I just seem to be misunderstanding it.
For background: I have a few versions of a basic package that includes a Foreach Loop container and a Data Flow with a few Derived Columns that imports CSV files into a SQL Server Staging table. It is very straightforward and does include an Execute SQL task and a File Move but those work fine. The issues are with the Foreach loop and the Data Flow.
I have one version of this package (let’s call it “A”) that seemed to be working fine. It would process multiple files in a folder, insert records into the staging table, properly execute the SQL Statements, and move the files to Archive. Everything seemed fine until I carefully QA’d the process. Turns out it was duplicating the data from one file, and never importing the data from a second Source File! Yet, the second/dupe round of data included the Source Filename (via a derived column) of the second file (but the data from the first). So it looked like I had successfully processed BOTH files until I looked at the actual data and saw that none of the values from the second source file were ever written to the Staging table.
Once I discovered this, I figured that the problem was in the Foreach loop and how I setup the different file path & name variables. So, I decided to try to make a new version of the package. I started by copying package A and created package B. In B, I deleted the Source Connection manager and created a new Connection Manager along with all new file & path variables. I then tried to cleanup/fix/replace various elements in my Data Flow and Foreach loop. In the process, I discovered that the Advanced Mappings from A – which DID work – were virtually all setup as String (even the Currency and Date columns). That did not seem right, so I modified each source money column by changing to data type Currency, and changed each date-related column to data type Date.
What followed has been dozens and dozens of Errors and I cannot get Package B to run. I have even changed all of the B data types back to String (mirroring the setup in Package A which DID work). But, still no joy.
This leads me to ask a few questions to those of you smarter than I:
1) Why can’t SSIS interpret Source CSV data using the proper data type? I.e. why do I need to set every Input column as a STRING when some columns are clearly & completely Numeric, Currency or Dates? (Yes, the Source CSV files are VERY clean – most don’t even have NULLS)
a. When I do change the Advanced mapping for a date-related Source column to Date, I get the ever present error message: [Flat File Source [30]] Error: Data conversion failed. The data conversion for column "Settle Date" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
2) When I reset the data types back to String in package B, I still get errors – usually Truncation errors (and Yes – I have adjusted the length to 250 in one of these columns).
a. Error Message: "The value could not be converted because of a potential loss of data.".
b. When I reset the Mappings to ignore the column (as a test), it throws a similar error at the next column.
3) Any ideas why Package A would dupe a file’s data and not process the second file, yet throw no errors and move both to Archive?
4) Why does the Data Viewer appear to have parsing errors (it shows data in the wrong columns) but when you use the Copy data feature in the data viewer and paste it into Excel, all of the data lines up perfectly?
5) Are there any tips & tricks that a rookie SSIS user needs to understand and which might not be apparent through the documentation and searching web articles as well as this site?
I can provide further details if they will help, but these packages are really very simple and should not be causing me this much frustration.
THANKS for any insights.
DGP
Wow seems like you have a lot of ssis issues... I think the reason for the same file being extracted is because of the the way your 'variable mappings' is defined.
Have you had a look and followed this guide:
https://www.simple-talk.com/sql/ssis/ssis-basics-introducing-the-foreach-loop-container/
Hope this helps.
Shaheen
Thanks Tab & Shaheen,
To all SSIS rookies - please learn from my mistakes!
It appears that my issue was actually in how I identified the TEXT QUALIFIER in the Connection Manager. I had entered "" and that was causing problems with how my columns were being parsed. The parsing issues caused unexpected values to appear in some of the columns and that was causing the errors in the package.
When I tried changing the the Text Qualifier to only ONE double quote - " - the whole thing worked!
As I mentioned - and as Shaheen suspected - my initial issues with the duplicate processing was probably due to how I setup the foreach loop. I had already fixed that, bit was still getting errors until I fixed the Text Qualifier.
I have only tested it a few times but it looks like that was the issue.
Thanks for the contributions.
DGP

Data Services CSV Flat File there should be a column delimiter after column [n]

I'm really struggling with this one. Data services (v14.2.3.549) keeps flagging up an error saying "A column delimiter was seen after column number <80> for row number <1> in file " it says this for what looks like every row it processes.
I've used the same settings as all the last files I imported, which are also CSV files. The files are exported from a web front end as excel then saved as csv. I tried opening the file with excel, clearing empty columns after end of data, in case there was anything in them, and rerunning to no avail.
I don't really know what to look for in the file so can anyone help me find out what I should be looking for so I can map my way to the problem. It seems that this problem is throughout this collection of files, as if I try importing using wild card on end of file name it comes up with same errors in other files.
Many thanks
Andrew
I used "Adaptable Schema" set to "yes" in the file format definition to get around this error.

Mediawiki blank all pages per namespace. I want to blank all User_talk pages

I want to know if there is a way to blank all user_talk pages enmass. Not delete them, just blank them. I don't know how to write bots, so I'm really asking if there is an extension or pre written bot for this. Thank you
You could write a simple SQL to do this, just look into the page table, for my installation the namespace value for User talk: is 3, so I could just delete all pages with namespace=3.
Deleting the row from the database, will leave the page as blank (not created)
I suggest using AWB. You can easy have it build a list based on a names space and then use a simple ReGeX replace such as: Search: (.*)* Replace with: (empty space).

SSRS Text Data Output - Header and Details on the same row?

I have turned headers off in the report server config file and am attempting to output the header rows above the details in CSV output. What is happening instead is the header is displaying on the same line as the details. If I add another table with the header row in it, it works, but leaves a one row gap between the header and the content. Any help getting this data to line up correctly would be greatly appreciated.
You could investigate using XSLT to transform the XML output into a desired format. This is really the only option I know of that I've used in the past for making a custom CSV type output. You could then undo the alteration to the server wide (?) config file as the XSLT file would be applied to just that report, making it easier for deployment
http://msdn.microsoft.com/en-us/library/ms159716(v=sql.90).aspx
(Probably more up to date links out there, just Google/Bing "SSRS XSLT" etc)